TCP Lec06

XML DOM in Java
Lecture 6
XML DOM video: https://www.youtube.com/watch?

XML Parser
 XML parsing is the process of reading an XML document
and providing an interface to the user application for
accessing the document.
 XML Parser is an API that reads the XML document, gets
its content based on the structure, and provides the
programming interfaces to user.
 Most XML parsers check the well-formedness of the XML

document and many can also validate the document with
respect to a DTD or XML schema.
Java XML Parser
 XML parser that can be handled by Java (JAXP):
 DOM (Document Object Model)
 SAX (Simple API for XML)
 StAX (Streaming API for XML)
XML Parser
In Memory Event
Tree Processing
(DOM)
Push Parser Pull Parser

(SAX) (StAX)
Type of Java XML Parser
 In Memory Tree:
 Java DOM parser traverses the XML file and creates the DOM
objects (corresponding to nodes in XML file).
 The entire document is read into memory as a tree structure.
 These DOM objects are linked together and it allows random
access to any part of the document.
 Event Processing:
 The parser reads an XML document from the beginning to the
end.
 When it encounters a node in the document, it generates an event
that triggers the corresponding event handler for that node.
 The handler thus applies the application logic to process the
node specifically.
Type of Java XML Parser cont..
 SAX:
 The SAX is a push model API – it is the API which calls the
event handler.
 The SAX parser thus “pushes” events into the handler.
 Once the parser is started, it have to iterate all the way until the
end, calling the handler for each and every XML event in the
XML document.
 StAX:
 The StAX is a pull model API – it is the event handler that calls
the parser API.
 Hence, the handler class controls when the parser is to move on
to the next event in the XML document.
 Hence, the parsing can be stopped at any point.
Type of Java XML Parser cont..
Feature DOM SAX StAX
API Type In memory tree Push, streaming Pull, streaming
Ease of Use High Medium High
XPath Capability Yes No No
CPU and Memory Varies Good Good
Efficiency
Forward Only No Yes Yes
Read XML Yes Yes Yes
Write XML Yes No Yes
Create, Read, Update or Delete Yes No No
Nodes
Parsing Package in Java
javax.xml.parsers The JAXP APIs, which provide a common interface for

different vendors' SAX and DOM parsers.
org.w3c.dom Defines the DOM programming interfaces for XML
documents, as specified by the W3C.
org.xml.sax Defines the basic SAX APIs.
javax.xml.transform Defines the XSLT APIs that enable the transformation of XML
into other forms.
javax.xml.stream Provides StAX-specific transformation APIs.
JAXP
 JAXP (Java API for XML Processing) – Java API
to process the XML data using Java applications.
 It is easy to use and vendor-neutral.
 It supports the DOM and SAX standards.
 The main JAXP APIs are defined in the package
javax.xml.parsers.
JAXP in DOM
• JAXP provides DocumentBuilder to load an XML
document as a DOM Document object (DOM Tree).
DOM Tree
XML Document
data Builder
DOM Document object
DocumentBuilderFactory
JAXP in DOM
• DocumentBuilder – It defines the API to obtain DOM
Document object from an XML document.
• DocumentBuilderFactory – It enables applications to
obtain a parser that produces DOM object trees from
XML documents
DOM Tree
Picture source: https://programming-tips.jp/archives/202205/13/index.html

Introduction of DOM
 When a XML file is parsed using DOM parser, it produces
a DOM tree (a hierarchical tree structure in memory) –
Document object.
 All elements in the DOM tree are represented as objects
corresponds to the hierarchy of the processed XML
document.
 DOM is an in-memory tree-based object representation of
XML documents that enables programmatic access to
their elements and attributes.
 The DOM is a W3C (World Wide Web Consortium)
standard.
 The Document interface and all related interfaces are
located in the Java package org.w3c.dom.
Introduction of DOM cont..
 Advantages:
 DOM gives a possibility to navigate the tree structure,
change elements and attributes, and create new XML
documents in memory.
 Disadvantages:
 DOM parsers are slow compared to SAX parsers, and
consumes a lot of memory.
DOM Tree
DOM Tree
XML Document
<?xml version="1.0" encoding="UTF-8"? Root element:
> <bookstore>
<bookstore>
<book ISBN="101223547">
<title>Data Structure</title> Element: Attribute:
<author>Willian Wong</author> <book> “ISBN”
<year>2020</year>
</book>
</bookstore>
Element: Element: Element:
<title> <author> <year>
Text: Text: Text:

Data Structure Willian Wong 2020
DOM Interface
Interface Description
Document It represents the XML document’s top-level node. It can access to
all the document nodes (include the root element).
Node It represents an XML document node.
NodeList It represents a read-only list of Node objects.
Element It represents an element node. Derives from Node.
Attr It represents an attribute node. Derives from Node.
CharacterData It represents character data. Derives from Node.
Text It represents a text node. Derives from CharacterData.
Comment It represents comment node, i.e., all the characters between the
starting ' ’. Derives from CharacterData.
DOM Interface cont..
Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example

DOM Parsing
Basic operation:
 Import DOM parser packages
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
 Create JAXP DocumentBuilder object

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
 Create Document object from xml file

Document doc = builder.parse(new File( file ));
read the XML file to Document object.

DOM Parsing cont..
Basic operation: <fruit>smal
l buah
 Normalize the XML Structure cempedak
 combines textual notes that span multiple lines </fruit>
 eliminates empty textual nodes Element: fruit
doc.getDocumentElement().normalize(); Text node: small buah
cempedak
 Get the root node
Node root = doc.getDocumentElement();
 Get the root element
Element root = doc.getDocumentElement();
• A Node represents all the various components of an XML document

• An Element is a subclass of the Node class and it represents an XML element
• A NodeList is a an ordered collection of nodes
DOM Parsing cont..
Basic operation:
 Get the sub-elements
//returns a list of sub-nodes of specified name
NodeList nList = doc.getElementsByTagName("elementName");
//returns a list of all child nodes

NodeList nList = root.getChildNodes();
 Get the attributes Node n = nList.item(0); // first node

//returns specific attribute Element en = (Element) n;
element.getAttribute("attributeName"); String type = en.getAttribute("id");
//returns a Map (table) of names/values attribute name

element.getAttributes();
DOM Parsing Process
Source: https://codebridgeplus.com/android-xml-parsing-using-dom-parser/
DOM Methods
Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example

Constant of Node Type
Picture source: http://www.w3ccoo.com/xml/dom_nodetype.asp

Java Program 1 and 2 (magazine.xml)
<?xml version="1.0" encoding="UTF-8"?>
<magazine>
<title type="computer">XML and Java</title>
<author>Willian Wong</author>
<date>June 2022</date>
<summary>Extensible Markup Language (XML) is a simple text form</summary>
</magazine>
Note:
DOM will load the entire XML into memory and
create a document tree object at once.
Java Program 1 If the XML file is too large, the program /
application will crash due to OOM (Out of
import javax.xml.parsers.*; Memory) error.
import java.io.File;
public class Book1 {

public static void main(String argv[]) {
try {
//Get Document Builder object
// Build Document object

Document doc = builder.parse(new File("magazine.xml"));
//Normalize the XML Structure

doc.getDocumentElement().normalize();
//Extract the root node

System.out.println("Root element:" + root.getNodeName());
Java Program 1 cont..
// Display all the child elements of the root node
NodeList childN = root.getChildNodes();
Node curN;
for (int i = 0; i < childN.getLength(); i++) {
curN = childN.item(i);
System.out.println(i + ". " +curN.getNodeName());
}
}
catch (Exception e) { // It catches all the exception raised.
System.out.println(e);
}
}
}
Java Program 2

try {
//Build Document object

Document doc = builder.parse(new File("magazine.xml"));


System.out.println("Root Element: " + root.getNodeName());
System.out.printf("=========================\n");
// Query by tag name It returns a NodeList of all elements
NodeList NL = doc.getElementsByTagName("magazine"); with the specified name.
for (int i = 0; i < NL.getLength(); i++) {
Node n = NL.item(i);
if (n.getNodeType() == n.ELEMENT_NODE){
Element en = (Element) n;
String title = en.getElementsByTagName("title").item(0).getTextContent();
String type = en.getAttribute("type");
String author = en.getElementsByTagName("author").item(0).getTextContent();
String date = en.getElementsByTagName("date").item(0).getTextContent();
String summary = en.getElementsByTagName("summary").item(0).getTextContent();
System.out.println("Current Element:" + n.getNodeName());

System.out.println("Type: " + type);
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Summary: " + summary + "\n");
}
}
}
}
}
}
Java Program 3 (magazine2.xml)
<?xml version="1.0" encoding="UTF-8"?>
<record>
<magazine type="computer">
<title>XML and Java</title>
<author>Willian Wong</author>
<date>June 2022</date>
<summary>XML is a simple text format</summary>
</magazine>
<magazine type="car">
<title>Car of the Year</title>
<author>Peter Jakson</author>
<date>Sep 2022</date>
<summary>The hottest car in the world</summary>
</magazine>
</record>
Java Program 3

try {
//Build Document object

Document doc = builder.parse(new File("magazine2.xml"));


System.out.println("Root Element: " + root.getNodeName());
System.out.printf("=========================\n");
// Query by tag name
NodeList NL = doc.getElementsByTagName("magazine");
for (int i = 0; i < NL.getLength(); i++) {
Node n = NL.item(i);
if (n.getNodeType() == n.ELEMENT_NODE){
Element en = (Element) n;
String title = en.getElementsByTagName("title").item(0).getTextContent();
String type = en.getAttribute("type");
String author = en.getElementsByTagName("author").item(0).getTextContent();
String date = en.getElementsByTagName("date").item(0).getTextContent();
String summary = en.getElementsByTagName("summary").item(0).getTextContent();
System.out.println("Current Element:" + n.getNodeName());

System.out.println("Type: " + type);
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Summary: " + summary + "\n");
}
}
}
}
}
}
Create XML Document
• The DOM Document object (DOM Tree) can be saved to
the XML document through the JAXP.
TransformerFactory
Source (DOM Tree)
Output
Transformer (XML data)
Transformer output
properties
Create XML Document cont..
Basic operation:
 Import DOM parser packages
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
 Create JAXP DocumentBuilder object

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
 Create DOM Document object

Document doc = dBuilder.newDocument();
Basic operation:
 Create root element
Element rootElement = doc.createElement("cookware");
doc.appendChild(rootElement);
 Create sub-element
Element pan1 = doc.createElement("pan");
rootElement.appendChild(pan1);
<cookware>
 <pan brand="Carote">Egg Pan</pan>
Create attribute </cookware>
Attr attr = doc.createAttribute("brand");
attr.setValue("Carote");
pan1.setAttributeNode(attr);
 Create text content
pan1.appendChild(doc.createTextNode("Egg Pan"));
Basic operation:
 Create JAXP Transform object
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
 Pretty print the XML

transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "3");
 Transform the DOM Document object to XML document

DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("cookware.xml"));
transformer.transform(source, result);
Java Program 4
public class CreateXML {

try {
// root element
// pan element
// setting attribute to element
// text content
pan1.appendChild(doc.createTextNode("Egg Pan"));
// create the Transformer object

// pretty print the XML

// write the content into xml file

// Output to console for testing
StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Java Program 5
public class CreateXML2 {

try {
// root element
// pan element
// setting attribute to element
// panname element
Element panname1 = doc.createElement("name");
Attr attrType1 = doc.createAttribute("type");
attrType1.setValue("24cm");
panname1.setAttributeNode(attrType1);
panname1.appendChild(doc.createTextNode("Non Stick Frying Pan"));
pan1.appendChild(panname1);
Element panname2 = doc.createElement("name");

Attr attrType2 = doc.createAttribute("type");
attrType2.setValue("18cm");
panname2.setAttributeNode(attrType2);
panname2.appendChild(doc.createTextNode("Non Stick 4 in 1 Egg Pan"));
pan1.appendChild(panname2);
// create the Transformer object
// pretty print XML

// write the content into xml file

// Output to console for testing

StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
}
catch (Exception e) {
e.printStackTrace();
}
}
}

TCP Lec06

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TCP Lec06

Uploaded by

Copyright:

Available Formats

XML DOM in Java

XML DOM video: https://www.youtube.com/watch?

 Most XML parsers check the well-formedness of the XML

Push Parser Pull Parser

javax.xml.parsers The JAXP APIs, which provide a common interface for

DOM Document object

Picture source: https://programming-tips.jp/archives/202205/13/index.html

Text: Text: Text:

Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example

 Create JAXP DocumentBuilder object

 Create Document object from xml file

read the XML file to Document object.

• A Node represents all the various components of an XML document

//returns a list of all child nodes

 Get the attributes Node n = nList.item(0); // first node

//returns a Map (table) of names/values attribute name

Picture source: https://www.codevoila.com/post/62/xml-processing-in-java-jaxp-dom-example

Picture source: http://www.w3ccoo.com/xml/dom_nodetype.asp

public class Book1 {

// Build Document object

//Normalize the XML Structure

//Extract the root node

public class Book2 {

//Build Document object

//Normalize the XML Structure

//Extract the root node

System.out.println("Current Element:" + n.getNodeName());

public class Book3 {

//Build Document object

//Normalize the XML Structure

//Extract the root node

System.out.println("Current Element:" + n.getNodeName());

Source (DOM Tree)

 Create JAXP DocumentBuilder object

 Create DOM Document object

 Pretty print the XML

 Transform the DOM Document object to XML document

public class CreateXML {

// create the Transformer object

// pretty print the XML

// write the content into xml file

public class CreateXML2 {

Element panname2 = doc.createElement("name");

// pretty print XML

// write the content into xml file

// Output to console for testing

You might also like