You are on page 1of 39

XML DOM in Java

Lecture 6

XML DOM video: https://www.youtube.com/watch?


XML Parser
 XML parsing is the process of reading an XML document
and providing an interface to the user application for
accessing the document.
 XML Parser is an API that reads the XML document, gets
its content based on the structure, and provides the
programming interfaces to user.

 Most XML parsers check the well-formedness of the XML


document and many can also validate the document with
respect to a DTD or XML schema.
Java XML Parser
 XML parser that can be handled by Java (JAXP):
 DOM (Document Object Model)
 SAX (Simple API for XML)
 StAX (Streaming API for XML)

XML Parser

In Memory Event
Tree Processing
(DOM)

Push Parser Pull Parser


(SAX) (StAX)
Type of Java XML Parser
 In Memory Tree:
 Java DOM parser traverses the XML file and creates the DOM
objects (corresponding to nodes in XML file).
 The entire document is read into memory as a tree structure.
 These DOM objects are linked together and it allows random
access to any part of the document.
 Event Processing:
 The parser reads an XML document from the beginning to the
end.
 When it encounters a node in the document, it generates an event
that triggers the corresponding event handler for that node.
 The handler thus applies the application logic to process the
node specifically.
Type of Java XML Parser cont..
 SAX:
 The SAX is a push model API – it is the API which calls the
event handler.
 The SAX parser thus “pushes” events into the handler.
 Once the parser is started, it have to iterate all the way until the
end, calling the handler for each and every XML event in the
XML document.

 StAX:
 The StAX is a pull model API – it is the event handler that calls
the parser API.
 Hence, the handler class controls when the parser is to move on
to the next event in the XML document.
 Hence, the parsing can be stopped at any point.
Type of Java XML Parser cont..
Feature DOM SAX StAX
API Type In memory tree Push, streaming Pull, streaming
Ease of Use High Medium High
XPath Capability Yes No No
CPU and Memory Varies Good Good
Efficiency
Forward Only No Yes Yes
Read XML Yes Yes Yes
Write XML Yes No Yes
Create, Read, Update or Delete Yes No No
Nodes
Parsing Package in Java

javax.xml.parsers The JAXP APIs, which provide a common interface for


different vendors' SAX and DOM parsers.
org.w3c.dom Defines the DOM programming interfaces for XML
documents, as specified by the W3C.
org.xml.sax Defines the basic SAX APIs.
javax.xml.transform Defines the XSLT APIs that enable the transformation of XML
into other forms.
javax.xml.stream Provides StAX-specific transformation APIs.
JAXP
 JAXP (Java API for XML Processing) – Java API
to process the XML data using Java applications.
 It is easy to use and vendor-neutral.
 It supports the DOM and SAX standards.
 The main JAXP APIs are defined in the package
javax.xml.parsers.
JAXP in DOM
• JAXP provides DocumentBuilder to load an XML
document as a DOM Document object (DOM Tree).

DOM Tree

XML Document
data Builder

DOM Document object

DocumentBuilderFactory
JAXP in DOM
• DocumentBuilder – It defines the API to obtain DOM
Document object from an XML document.
• DocumentBuilderFactory – It enables applications to
obtain a parser that produces DOM object trees from
XML documents

DOM Tree

Picture source: https://programming-tips.jp/archives/202205/13/index.html


Introduction of DOM
 When a XML file is parsed using DOM parser, it produces
a DOM tree (a hierarchical tree structure in memory) –
Document object.
 All elements in the DOM tree are represented as objects
corresponds to the hierarchy of the processed XML
document.
 DOM is an in-memory tree-based object representation of
XML documents that enables programmatic access to
their elements and attributes.
 The DOM is a W3C (World Wide Web Consortium)
standard.
 The Document interface and all related interfaces are
located in the Java package org.w3c.dom.
Introduction of DOM cont..
 Advantages:
 DOM gives a possibility to navigate the tree structure,
change elements and attributes, and create new XML
documents in memory.

 Disadvantages:
 DOM parsers are slow compared to SAX parsers, and
consumes a lot of memory.
DOM Tree
DOM Tree
XML Document
<?xml version="1.0" encoding="UTF-8"? Root element:
> <bookstore>
<bookstore>
<book ISBN="101223547">
<title>Data Structure</title> Element: Attribute:
<author>Willian Wong</author> <book> “ISBN”
<year>2020</year>
</book>
</bookstore>
Element: Element: Element:
<title> <author> <year>

Text: Text: Text:


Data Structure Willian Wong 2020
DOM Interface
Interface Description
Document It represents the XML document’s top-level node. It can access to
all the document nodes (include the root element).
Node It represents an XML document node.
NodeList It represents a read-only list of Node objects.
Element It represents an element node. Derives from Node.
Attr It represents an attribute node. Derives from Node.
CharacterData It represents character data. Derives from Node.
Text It represents a text node. Derives from CharacterData.
Comment It represents comment node, i.e., all the characters between the
starting ' <!--' and ending '-->’. Derives from CharacterData.
DOM Interface cont..

Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example


DOM Parsing
Basic operation:
 Import DOM parser packages
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;

 Create JAXP DocumentBuilder object


DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

 Create Document object from xml file


Document doc = builder.parse(new File( file ));

read the XML file to Document object.


DOM Parsing cont..
Basic operation: <fruit>smal
l buah
 Normalize the XML Structure cempedak
 combines textual notes that span multiple lines </fruit>
 eliminates empty textual nodes Element: fruit
doc.getDocumentElement().normalize(); Text node: small buah
cempedak
 Get the root node
Node root = doc.getDocumentElement();
 Get the root element
Element root = doc.getDocumentElement();

• A Node represents all the various components of an XML document


• An Element is a subclass of the Node class and it represents an XML element
• A NodeList is a an ordered collection of nodes
DOM Parsing cont..
Basic operation:
 Get the sub-elements
//returns a list of sub-nodes of specified name
NodeList nList = doc.getElementsByTagName("elementName");

//returns a list of all child nodes


NodeList nList = root.getChildNodes();

 Get the attributes Node n = nList.item(0); // first node


//returns specific attribute Element en = (Element) n;
element.getAttribute("attributeName"); String type = en.getAttribute("id");

//returns a Map (table) of names/values attribute name


element.getAttributes();
DOM Parsing Process

Source: https://codebridgeplus.com/android-xml-parsing-using-dom-parser/
DOM Methods

Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example


Constant of Node Type

Picture source: http://www.w3ccoo.com/xml/dom_nodetype.asp


Java Program 1 and 2 (magazine.xml)
<?xml version="1.0" encoding="UTF-8"?>
<magazine>
<title type="computer">XML and Java</title>
<author>Willian Wong</author>
<date>June 2022</date>
<summary>Extensible Markup Language (XML) is a simple text form</summary>
</magazine>
Note:
DOM will load the entire XML into memory and
create a document tree object at once.
Java Program 1 If the XML file is too large, the program /
application will crash due to OOM (Out of
import javax.xml.parsers.*; Memory) error.
import org.w3c.dom.*;
import java.io.File;

public class Book1 {


public static void main(String argv[]) {
try {
//Get Document Builder object
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

// Build Document object


Document doc = builder.parse(new File("magazine.xml"));

//Normalize the XML Structure


doc.getDocumentElement().normalize();

//Extract the root node


Node root = doc.getDocumentElement();
System.out.println("Root element:" + root.getNodeName());
Java Program 1 cont..
// Display all the child elements of the root node
NodeList childN = root.getChildNodes();

Node curN;
for (int i = 0; i < childN.getLength(); i++) {
curN = childN.item(i);
System.out.println(i + ". " +curN.getNodeName());
}
}
catch (Exception e) { // It catches all the exception raised.
System.out.println(e);
}
}
}
Java Program 2
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.File;

public class Book2 {


public static void main(String argv[]) {
try {
//Get Document Builder object
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

//Build Document object


Document doc = builder.parse(new File("magazine.xml"));

//Normalize the XML Structure


doc.getDocumentElement().normalize();

//Extract the root node


Node root = doc.getDocumentElement();
System.out.println("Root Element: " + root.getNodeName());
System.out.printf("=========================\n");
Java Program 2 cont..
// Query by tag name It returns a NodeList of all elements
NodeList NL = doc.getElementsByTagName("magazine"); with the specified name.
for (int i = 0; i < NL.getLength(); i++) {
Node n = NL.item(i);

if (n.getNodeType() == n.ELEMENT_NODE){
Element en = (Element) n;
String title = en.getElementsByTagName("title").item(0).getTextContent();
String type = en.getAttribute("type");
String author = en.getElementsByTagName("author").item(0).getTextContent();
String date = en.getElementsByTagName("date").item(0).getTextContent();
String summary = en.getElementsByTagName("summary").item(0).getTextContent();

System.out.println("Current Element:" + n.getNodeName());


System.out.println("Type: " + type);
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Summary: " + summary + "\n");
}
}
}
catch (Exception e) { // It catches all the exception raised.
System.out.println(e);
}
}
}
Java Program 3 (magazine2.xml)
<?xml version="1.0" encoding="UTF-8"?>
<record>
<magazine type="computer">
<title>XML and Java</title>
<author>Willian Wong</author>
<date>June 2022</date>
<summary>XML is a simple text format</summary>
</magazine>
<magazine type="car">
<title>Car of the Year</title>
<author>Peter Jakson</author>
<date>Sep 2022</date>
<summary>The hottest car in the world</summary>
</magazine>
</record>
Java Program 3
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.File;

public class Book3 {


public static void main(String argv[]) {
try {
//Get Document Builder object
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

//Build Document object


Document doc = builder.parse(new File("magazine2.xml"));

//Normalize the XML Structure


doc.getDocumentElement().normalize();

//Extract the root node


Node root = doc.getDocumentElement();
System.out.println("Root Element: " + root.getNodeName());
System.out.printf("=========================\n");
Java Program 3 cont..
// Query by tag name
NodeList NL = doc.getElementsByTagName("magazine");
for (int i = 0; i < NL.getLength(); i++) {
Node n = NL.item(i);

if (n.getNodeType() == n.ELEMENT_NODE){
Element en = (Element) n;
String title = en.getElementsByTagName("title").item(0).getTextContent();
String type = en.getAttribute("type");
String author = en.getElementsByTagName("author").item(0).getTextContent();
String date = en.getElementsByTagName("date").item(0).getTextContent();
String summary = en.getElementsByTagName("summary").item(0).getTextContent();

System.out.println("Current Element:" + n.getNodeName());


System.out.println("Type: " + type);
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Summary: " + summary + "\n");
}
}
}
catch (Exception e) { // It catches all the exception raised.
System.out.println(e);
}
}
}
Create XML Document
• The DOM Document object (DOM Tree) can be saved to
the XML document through the JAXP.

TransformerFactory

Source (DOM Tree)

Output
Transformer (XML data)

Transformer output
properties
Create XML Document cont..
Basic operation:
 Import DOM parser packages
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import java.io.File;

 Create JAXP DocumentBuilder object


DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

 Create DOM Document object


Document doc = dBuilder.newDocument();
Create XML Document cont..
Basic operation:
 Create root element
Element rootElement = doc.createElement("cookware");
doc.appendChild(rootElement);
 Create sub-element
Element pan1 = doc.createElement("pan");
rootElement.appendChild(pan1);
<cookware>
 <pan brand="Carote">Egg Pan</pan>
Create attribute </cookware>
Attr attr = doc.createAttribute("brand");
attr.setValue("Carote");
pan1.setAttributeNode(attr);
 Create text content
pan1.appendChild(doc.createTextNode("Egg Pan"));
rootElement.appendChild(pan1);
Create XML Document cont..
Basic operation:
 Create JAXP Transform object
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();

 Pretty print the XML


transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "3");

 Transform the DOM Document object to XML document


DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("cookware.xml"));
transformer.transform(source, result);
Java Program 4
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import java.io.File;

public class CreateXML {


public static void main(String argv[]) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();

// root element
Element rootElement = doc.createElement("cookware");
doc.appendChild(rootElement);

// pan element
Element pan1 = doc.createElement("pan");
rootElement.appendChild(pan1);
Java Program 4 cont..
// setting attribute to element
Attr attr = doc.createAttribute("brand");
attr.setValue("Carote");
pan1.setAttributeNode(attr);

// text content
pan1.appendChild(doc.createTextNode("Egg Pan"));
rootElement.appendChild(pan1);

// create the Transformer object


TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();

// pretty print the XML


transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "3");

// write the content into xml file


DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("cookware.xml"));
transformer.transform(source, result);
Java Program 4 cont..
// Output to console for testing
StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Java Program 5
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import java.io.File;

public class CreateXML2 {


public static void main(String argv[]) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();

// root element
Element rootElement = doc.createElement("cookware");
doc.appendChild(rootElement);

// pan element
Element pan1 = doc.createElement("pan");
rootElement.appendChild(pan1);
Java Program 5 cont..
// setting attribute to element
Attr attr = doc.createAttribute("brand");
attr.setValue("Carote");
pan1.setAttributeNode(attr);

// panname element
Element panname1 = doc.createElement("name");
Attr attrType1 = doc.createAttribute("type");
attrType1.setValue("24cm");
panname1.setAttributeNode(attrType1);
panname1.appendChild(doc.createTextNode("Non Stick Frying Pan"));
pan1.appendChild(panname1);

Element panname2 = doc.createElement("name");


Attr attrType2 = doc.createAttribute("type");
attrType2.setValue("18cm");
panname2.setAttributeNode(attrType2);
panname2.appendChild(doc.createTextNode("Non Stick 4 in 1 Egg Pan"));
pan1.appendChild(panname2);
Java Program 5 cont..
// create the Transformer object
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();

// pretty print XML


transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "3");

// write the content into xml file


DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("cookware.xml"));
transformer.transform(source, result);

// Output to console for testing


StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
}
catch (Exception e) {
e.printStackTrace();
}
}
}

You might also like