XML Parsers

Parsing

Parsing
• XML parsing is required so that our application can inspect, retrieve and modify the document contents. XML parser program this sits between XML document and our application. In an attempt to standardize the way parser should work, two specification has come out, that spells out the interfaces that an application can expect from a parser: • SAX: the Simple API for XML: SAX processes the XML document a tag at a time and generates events. • DOM: the Document Object Model: describes the document as a data-structure in the form of tree. It first loads the entire xml in the form of tree. Then application can edit any traverse and edit any node.

SAX Vs. DOM
• When it comes to fast, efficient reading of XML data, SAX is hard to beat. It requires little memory, because it does not construct an internal representation (tree structure) of the XML data. Instead, it simply sends data to the application as it is read —your application can then do whatever it wants to do with the data it sees.But you can’t go back to an earlier position or leap ahead to a different position. • In general, it works well when you simply want to read data and have the application act on it. • DOM is not suitable for the above since it has to read the entire data before it acts on it. Also it requires more memory. • But when you need to modify an XML structure — especially when you need to modify it interactively, an in-memory structure like the Document Object

•THE Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language. •JAXP leverages the parser standards SAX (Simple API for XML Parsing) and DOM (Document Object Model) so that you can choose to parse your data as a stream of events or to build an object representation of it. •JAXP also supports the XSLT (XML Stylesheet Language Transformations) standard, giving you control over the presentation of the data and enabling you to convert the data to other XML documents or to other formats, such as HTML. •JAXP also provides namespace support, allowing you to work with DTDs that might otherwise have naming conflicts. •JAXP comes with standard java SDK.

JAXP API

Steps to write application
1. 2. 3. • Obtain a parser object Obtain a source of XML data Give that source to the parser to parse. JAXP has just Interfaces for SAX and DOM and abstract classes that provide factory methods for obtaining instances of parser and an XML data source. 4 packages: org.xml.sax: SAX Distribution org.xml.sax.helper: SAX Distribution org.w3c.dom: DOM in java javax.xml.parsers: JAXP distribution

• • • • •

SAX Programming model
•Not a W3C standard but widely adopted including IBM and Sun. •The standard SAX distribution for java contains 2 packages: • org.xml.sax • org.xml.sax.helpers. •They contain 11 classes and interfaces.

Classes
• Classes related to Parser:
• org.xml.sax.XMLReader is the interface that an XML parser's SAX2 driver must implement. It is an Interface for reading an XML document using callbacks. • javax.xml.parsers.SAXParser defines the API that wraps an XMLReader implementation class. An instance of this class can be obtained from the javax.xml.parsers.SAXParserFactory. newSAXParser() method.

•Classes related to application that we write: •Contain interface called
org.xml.sax.ContentHandler: This is the main interface that most SAX applications implement.This interface define the methods which the parser class will use as call backs. The Parser class excepts an object of this type to be passed in its constructor. •org.xml.sax.helpers.DefaultHandler is a class that implements ContentHandler. Default base class for SAX2 event handlers.

•Exception classes: SAXException,
SAXParserException

•Helper classes: SAXParserFactory
•When parser reaches the end of the document, the only data in the memory is what your application saved.

SAX Programming model
startDocument DTD (optional)
2.input

SAXParser calls XML source 2. input handler methods

e v e n t s

startElement characters endElement endDocument etc output

1. creates

2. input

SAXParserFactory Class implementing ContentHandler

org.xml.sax.ContentHandler
• It is this interface which declares the event handling methods of SAX. • void characters(char ch[], int start, int length) • void startDocument • void endDocument() • public void startElement(String uri, String localName, String qName, Attributes attributes) • void endElement(String uri, String localName, String qName) • void processingInstruction(String target, String data)

• DefaultHandler: The easiest way to implement ContentHandler interface is to extend the DefaultHandler class, defined in the org.xml.sax.helpers package. • SAXParserFactory, SAXParser: SAXParser is an abstract class. The static newInstance() method of SAXParserFactory returns a new concrete implementation of this class. It throws a ParserConfigurationException if it is unable to produce a parser that matches the specified configuration of options. • Xerces Parser from Apache: implements the Parser and uses JAXP API (org.apache.xerces.jaxp).

DefaultHandler and SAXParser

//Program 1: Counting no. of elements
import java.io.*; import org.xml.sax.Attributes; import javax.xml.parsers.SAXParser; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.SAXParserFactory; public class CountSax extends DefaultHandler{ public static void main(String s[]) throws Exception{ if (s.length !=1){ System.out.println("Usage: cmd filename"); System.exit(0);

// Use the default (non-validating) parser
SAXParserFactory factory=SAXParserFactory.newInstance();

/*Creates a new instance of a SAXParser using the currently configured factory parameters.*/
SAXParser saxParser=factory.newSAXParser(); File f= new File(s[0]); if(f.exists())

// Parse the input
saxParser.parse(f,new CountSax()); else System.out.println("unknown file"); }

static private int ele=0; public void startDocument(){ele=0;} public void startElement(String uri, String localName, String qName, Attributes attrs) { ele++;} public void endDocument(){ System.out.println("Number of elements :" +ele); }}

Execution:
java CountSax note.xml Number of elements :4

/*Program 2: Creating HTML document to represent note.xml*/
import java.io.*; import org.xml.sax.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class NoteSax extends DefaultHandler{ PrintWriter out; public NoteSax()throws Exception{ out= new PrintWriter(new BufferedWriter(new FileWriter("note.html"))); }

public static void main(String s[]) throws Exception{ if (s.length !=1){ System.out.println("Usage: cmd filename"); System.exit(0);} SAXParserFactory factory=SAXParserFactory.newInstance(); SAXParser saxParser=factory.newSAXParser(); File f= new File(s[0]); if(f.exists()) saxParser.parse(f,new NoteSax()); else System.out.println("unknown file");}

public void startDocument(){} public void startElement(String uri, String localName, String qName, Attributes attrs){ if(qName.equals("note")) out.println("<html><head><title>Note</titl ></ head ><body>"); if(qName.equals("to"))out.println(" To, "); if(qName.equals("from")) out.println("<p align='right'><font color='black'> -from "); if(qName.equals("body") && (attrs.getLength()>0)) {for (int i = 0; i < attrs.getLength(); i++) { String aName = attrs.getQName(i);

String value=attrs.getValue(i);

if(aName.equals("type")){ if( value.equals("warm")) out.println("<font color='green'>"); if( value.equals("cold")) out.println("<font color='red'>"); if( value.equals("formal")) out.println("<font color='blue'>"); if(aName.equals("subject")) out.println("<I>" +value+":</I>"); }//end of for }// end of if } }

public void endElement(String uri, String localName, String qName, Attributes attrs){ if(qName.equals("body")) out.println("</font>"); if(qName.equals("from")) out.println("</font></p>");} public void endDocument(){ out.println("</body></html>"); out.close();} public void characters(char buf[], int offs, int l) throws SAXException{ String s = new String(buf, offs, l); out.println(s+ "<br>");}}

note.xml <note> <to>you</to> <body1 type="warm" subject="Contemplation">If today was a perfect day then there would be no tomorrow</body1> <from>God</from> </note> Execution: java CountSax note.xml  creates note.html

<html><head><title>Note</title></head><body> To, you<br> <font color='green'> <I>Contemplation:</I> If today was a perfect day then there would be no tomorrow<br> <br> <p align='right'><font color='black'> -from God<br> </body></html> note.html

DOM
•Document object model. It is a standard produced by W3C . •All DOM processing assumes that you have read and parsed a complete document into memory so that all parts are equally accessible. The data is represented in the form of tree. •Disadvantages 4.It is pretty clumsy if you want to pick out a few elements. 5.Memory requirement could get restrictive

org.w3c.dom package
Interfaces: • Node • Document (extends Node):The Document interface represents the entire HTML or XML document. • NodeList interface provides the abstraction of an ordered collection of nodes • There are static methods in Node interface to check element type. Node.ELEMENT_NODE, Node. CDATA_SECTION_NODE

Methods
• Document Methods: • public NodeList getElementsByTagName(String tagname ) • public Element createElement(String tagName) throws DOMException • public Comment createComment(String data) • public Text createTextNode(String data) • NodeList Methods: • public int getLength() • public Node item(int index)

Node Methods: •Methods to access information about current node: •public String getNodeName() •public short getNodeType() •public NodeList getChildNodes() •Methods to modify the node’s children •public Node appendChild(Node newChild) throws DOMException •public Node removeChild(Node oldChild) throws DOMException •public Node replaceChild(Node newChild, Node oldChild) throws DOMException

DOM Programming model
XML source
2.input 2.input 3.Parse and build the tree

DTD (optional) Document (DOM)

DocumentBuilder
1.creates

Node

DocumentBuilderFactory

Recursively search nodes

Search Mechanism Output

// Program 1: counting no. of elements import org.w3c.dom.*; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import java.io.*; public class CountDom{ public static void main(String str[])throws Exception{ File f= new File(str[0]); Node n= readFile(f); int ele=getElementCount(n); System.out.println(ele);}

public static Document readFile(File f) throws Exception{ Document d; DocumentBuilderFactory dbf= DocumentBuilderFactory.newInstance(); dbf.setValidating(true); DocumentBuilder db=dbf.newDocumentBuilder(); d=db.parse(f); return d;} public static int getElementCount(Node node){ if(node==null) return 0; int sum=0;

boolean isElement=(node.getNodeType()==Node.ELEMENT_NOD E); if(isElement) sum=1; NodeList children= node.getChildNodes(); if(children==null) return sum; for(int i=0;i<children.getLength();i++) sum+=getElementCount(children.item(i)); return sum; } }

// Program 2: Adding a comment and a node and displaying import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import java.io.*; import org.w3c.dom.*; public class AddNodeDom{ static Node n1; static Comment c;

public static void main(String str[])throws Exception{ File f= new File(str[0]); Document n= readFile(f);

setElements(n); display(n); System.out.println("done"); } public static Document readFile(File f) throws Exception{ Document d; DocumentBuilderFactory dbf= DocumentBuilderFactory.newInstance(); DocumentBuilder db=dbf.newDocumentBuilder(); d=db.parse(f); return d; }

public static void display(Node node){ if(node.getNodeType()==Node.ELEMENT_NODE) System.out.print(node.getNodeName()+":"); if(node.getNodeType()==Node.TEXT_NODE || node.getNodeType()==Node.COMMENT_NODE ) System.out.println(node.getNodeValue().trim()); NodeList children= node.getChildNodes(); if(children!=null) for(int i=0;i<children.getLength();i++) display(children.item(i)); }

public static void setElements(Node node){ if(node==null) return; boolean isEle=(node.getNodeType()==Node.ELEMENT_NODE); if(isEle && node.getNodeName().equals("displayname")) n1= node; if(isEle && node.getNodeName().equals("servlet")) { node.appendChild(c); node.appendChild(n1);} NodeList children= node.getChildNodes(); if(children!=null) for(int i=0;i<children.getLength();i++) setElements(children.item(i));}}

Sign up to vote on this title
UsefulNot useful