You are on page 1of 58

XML and Java

Alex Chaffee, alex@jguru.com http://www.purpletech.com
©1996-2000 jGuru.com

Overview
‡ ‡ ‡ ‡ ‡ Why Java and XML? Parsers: DOM, JDOM, SAX Using XML from JSP Java/XML Object Mapping Resources

Why Java/XML?
‡ XML maps well to Java
± late binding ± hierarchical (OO) data model

‡ ‡ ‡ ‡

Unicode support in Java XML Structures map well to Java Objects Portability Network friendly

XML Parsers
‡ ‡ ‡ ‡ ‡ Validating/Non-Validating Tree-based Event-based SAX-compliance Not technically parsers
± XSL ± XPath

Some Java XML Parsers
‡ DOM
± ± ± ± ± Sun JAXP IBM XML4J Apache Xerces Resin (Caucho) DXP (DataChannel)

‡ SAX
± Sun JAXP ± SAXON

‡ JDOM

Dom API
‡ ‡ ‡ ‡ ‡ Tree-based Node classes Part of W3C spec Sorting/Modifying of Elements Sharing document with other applications

XML is a Tree
<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> name <meal name="snack"> <food>Chips</food> </meal> "breakfast" </menu>

menu

meal

meal

food
"Scrambled Eggs"

food
"Hash Browns"

drink
"Orange Juice"

DOM API (cont¶d)
‡ Based on Interfaces
± Good design style - separate interface from implementation ± Document, Text, Processing Instruction, Element ALL are interfaces ± All extend interface Node ± Including interface Attr (parentNode is null, etc)

DOM Example
public void print(Node node) { //recursive method call using DOM API... int type = node.getNodeType(); case Node.ELEMENT_NODE: // print element with attributes out.print('<'); out.print(node.getNodeName()); Attr attrs[] = node.getAttributes(); for (int i = 0; i < attrs.length; i++) { Attr attr = attrs[i]; out.print(' '); out.print(attr.getNodeName());out.print("=\""); out.print(normalize(attr.getNodeValue())); out.print('"'); } out.print('>'); NodeList children = node.getChildNodes(); if (children != null) { int len = children.getLength(); for (int i = 0; i < len; i++) { print(children.item(i)); } } break; case Node.ENTITY_REFERENCE_NODE: // handle entity reference nodes // ...

DOM API Highlights
‡ Node
± getNodeType() ± getNodeName() ± getNodeValue()
‡ returns null for Elements

‡Attr
±attributes are not technically child nodes ±getParent() et al. return null ±getName(), getValue()

± getAttributes()
‡ returns null for non-Elements

‡Document
±has one child node - the root element
‡call getDocumentElement()

± getChildNodes() ± getParentNode()

‡

Element
± getTagName()
‡ same as getNodeName()

±contains factory methods for creating attributes, comments, etc.

± getElementsByTagName(String tag)
‡ get all children of this name, recursively

± normalize()
‡ smooshes text nodes together

DOM Level 2
‡ Adds namespace support, extra methods ‡ Not supported by Java XML processors yet

The Trouble With DOM
‡ Written by C programmers ‡ Cumbersome API
± Node does double-duty as collection ± Multiple ways to traverse, with different interfaces

‡ Tedious to walk around tree to do simple tasks ‡ Doesn't support Java standards (java.util collections)

JDOM: Better than DOM
‡ ‡ ‡ ‡ Java from the ground up Open source Clean, simple API Uses Java Collections

JDOM vs. DOM
‡ ‡ ‡ ‡ Classes / Interfaces Java / Many languages Java Collections / Idiosyncratic collections getChildText() and other useful methods / getNextSibling() and other useless methods

JDOM: The Best of Both Worlds
‡ Clean, easy to use API
± document.getRootElement().getChild("book"). getChildText("title")

‡ Random-access tree model (like DOM) ‡ Can use SAX for backing parser ‡ Open Source, not Standards Committee
± Allowed benevolent dictatorship -> clean design

JDOM Example
XMLOutputter out = new XMLOutputter(); out.output( element, System.out ); Or« public void print(Element node) { //recursive method call using JDOM API... out.print('<'); out.print(node.getName()); List attrs = node.getAttributes(); for (int i = 0; i < attrs.size(); i++) { Attribute attr = (Attribute)attrs.get(i); out.print(' '); out.print(attr.getName());out.print("=\""); out.print(attr.getValue() ); out.print('"'); } out.print('>'); List children = node.getChildren(); if (children != null) { for (int i = 0; i < children.size(); i++) { print(children.item(i)); } }

JDOM Example
public Element toElement(User dobj) throws IOException { User obj = (User)dobj; Element element = new Element("user"); element.addAttribute("userid", ""+user.getUserid()); String val; val = obj.getUsername(); if (val != null) { element.addChild(new Element("username").setText(val)); } val = obj.getPasswordEncrypted(); if (val != null) { element.addChild(new Element("passwordEncrypted").setText(val)); } return element; }

JDOM Example
public User fromElement(Element element) throws DataObjectException { List list; User obj = new User(); String value = null; Attribute userid = element.getAttribute("userid"); if (userid != null) { obj.setUserid( userid.getIntValue() ); } value = element.getChildText("username"); if (value != null) { obj.setUsername( value ); } value = element.getChildText("passwordEncrypted"); if (value != null) { obj.setPasswordEncrypted( value ); } return obj; }

DOMUtils
‡ DOM is clunky ‡ DOMUtils.java - set of utilities on top of DOM ‡ http://www.purpletech.com/code ‡ Or just use JDOM

Event-Based Parsers
‡ Scans document top to bottom ‡ Invokes callback methods ‡ Treats XML not like a tree, but like a list (of tags and content) ‡ Pro:
± Not necessary to cache entire document ± Faster, smaller, simpler

‡ Con:
± must maintain state on your own ± can't easily backtrack or skip around

SAX API
‡ ‡ ‡ ‡ ‡ Grew out of xmldev mailing list (grassroots) Event-based startElement(), endElement() Application intercepts events Not necessary to cache entire document

Sax API (cont¶d)
public void startElement(String name, AttributeList atts) { // perform implementation out.print(³Element name is ³ + name); out.print(³, first attribute is ³ + atts.getName(0) + ³, value is ³ + atts.getValue(0)); }

XPath
‡ The stuff inside the quotes in XSL ‡ Directory-path metaphor for navigating XML document
± "/curriculum/class[4]/student[first()]"

‡ Implementations
± Resin (Caucho) built on DOM ± JDOM has one in the "contrib" package

‡ Very efficient API for extracting specific info from an XML tree
± Don't have to walk the DOM or wait for the SAX ± Con: yet another syntax / language, without full access to Java libraries

XSL
‡ ‡ ‡ ‡ eXtensible Stylesheet Language transforms one XML document into another XSL file is a list of rules Java XSL processors exist
± Apache Xalan
‡ (not to be confused with Apache Xerces)

± ± ± ±

IBM LotusXSL Resin SAXON XT

Trouble with XSL
‡ It's a programming language masquerading as a markup language ‡ Difficult to debug ‡ Turns traditional programming mindset on its head
± Declarative vs. procedural ± Recursive, like Prolog

‡ Doesn't really separate presentation from code

JSP
‡ JavaServer Pages ‡ Outputting XML
<% User = loadUser(request.getParameter("username")); response.setContentType("text/xml"); %> <user> <username><%=user.getUsername()%></username> <realname><%=user.getRealname()%></realname> </user>

‡ Can also output HTML based on XML parser, naturally (see my "JSP and XML" talk, or http://www.purpletech.com)

XMLC
‡ A radical solution to the problem of how to separate presentation template from logic« ‡ «to actually separate the presentation template from the logic!

XMLC Architecture
HTML (with ID tags) XMLC HTML Object (automatically generated) Setting values Java Class (e.g. Servlet) Reading data Data

HTML (dynamically-generated)

XMLC Details
‡ Open-source (xmlc.enhydra.org) ‡ Uses W3C DOM APIs ‡ Generates "set" methods per tag
± Source: <H1 id="title">Hello</H1> ± Code: obj.setElementTitle("Goodbye") ± Output: <H1>Goodbye</H1>

‡ Allows graphic designers and database programmers to develop in parallel ‡ Works with XML source too

XML and Java in 2001
‡ Many apps' config files are in XML
± Ant ± Tomcat ± Servlets

‡ Several XML-based Sun APIs
± ± ± ± JAXP JAXM ebXML SOAP (half-heartedly supported )

Java XML Documentation
‡ Jdox
± Javadoc -> single XML file ± http://www.componentregistry.com/ ± Ready for transformation (e.g. XSL)

‡ Java Doclet
± http://www.sun.com/xml/developers/doclet ± Javadoc -> multiple XML files (one per class)

‡ Cocoon
± Has alpha XML doclet

Soapbox: DTDs are irrelevant
‡ DTDs describe structure of an unknown document ‡ But in most applications, you already know the structure ± it's implicit in the code ‡ If the document does not conform, there will be a runtime error, and/or corrupt/null data ‡ This is as it should be! GIGO. ‡ You could have a separate "sanity check" phase, but parsing with validation "on" just slows down your app ‡ Useful for large-scale document-processing applications, but not for custom apps or transformations

XML and Server-Side Java

©1996-2000 jGuru.com

Server-Side Java-XML Architecture
‡ Many possible architectures
± XML Data Source
‡ disk or database or other data feed

± Java API
‡ DOM or SAX or XPath or XSL

± XSL
‡ optional transformation into final HTML, or HTML snippets, or intermediate XML

± Java Business Logic
‡ JavaBeans and/or EJB

± Java Presentation Code
‡ Servlets and/or JSP and/or XMLC

Server-Side Java-XML Architecture
Java UI Java Business Logic JavaBeans XML Processors XML Data Sources

JSP HTML Servlet

DOM, SAX

Filesystem

EJB

XPath

XML-savvy RDBMS

XSL

XML Data Feed

Server-Side Architecture Notes
‡ Note that you can skip any layer, and/or call within layers
± e.g. XML->XSL->DOM->JSP, or ± JSP->Servlet->DOM->XML

Cache as Cache Can
‡ Caching is essential ‡ Whatever its advantages, XML is slow ‡ Cache results on disk and/or in memory

XML <-> Java Object Mapping

©1996-2000 jGuru.com

XML and Object Mapping
‡ Java -> XML
± Start with Java class definitions ± Serialize them - write them to an XML stream ± Deserialize them - read values in from previously serialized file

‡ XML -> Java
± Start with XML document type ± Generate Java classes that correspond to elements ± Classes can read in data, and write in compatible format (shareable)

Java -> XML Implementations
‡ Java -> XML
± ± ± ± ± ± ± BeanML Coins / BML Sun's XMLOutputStream/XMLInputStream XwingML (Bluestone) JDOM BeanMapper Quick? JSP (must roll your own)

BeanML Code (Extract)
<?xml version="1.0"?> <bean class="java.awt.Panel"> <property name="background" value="0xeeeeee"/> <property name="layout"> <bean class="java.awt.BorderLayout"/> </property> <add> <bean class="demos.juggler.Juggler" id="Juggler"> <property name="animationRate" value="50"/> <call-method name="start"/> </bean> <string>Center</string> </add> «</bean>

Coins
‡ Part of MDSAX ‡ Connect XML Elements and JavaBeans ‡ Uses Sax Parser, Docuverse DOM to convert XML into JavaBean ‡ Uses BML - (Bindings Markup Language) to define mapping of XML elements to Java Classes

JDOM BeanMapper
‡ Written by Alex Chaffee ‡ Default implementation outputs elementonly XML, one element per property, named after property ‡ Also goes other direction (XML->Java)
± Doesn't (yet) automatically build bean classes

‡ Can set mapping to other custom element names / attributes

XMLOutputStream/XMLInputStream
‡ From some Sun engineers
± http://java.sun.com/products/jfc/tsc/articles/persistence/

‡ ‡ ‡ ‡

May possibly become core, but unclear Serializes Java classes to and from XML Works with existing Java Serialization Not tied to a specific XML representation
± You can build your own plug-in parser

‡ Theoretically, can be used for XML->Java as well

XMLOutputStream/XMLInputStream

Sample XWingML code
<?xml version="1.0"?> <!DOCTYPE XwingML SYSTEM "file:///c:/XwingML/xml/xwingml.dtd"> <XwingML> <Classes> <Instance name="OpenFile" className="XMLOpenFile"/> <Instance name="SaveFile" className="XMLSaveFile"/> <Instance name="ParseFile" className="XMLParseFile"/> <Instance name="About" className="XMLAbout"/> </Classes> <JFrame name="MainFrame" title="Bluestone XMLEdit" image="icon.gif" x="10%" y="10%" width="80%" height="80%"> <JMenuBar> <JMenu text="File" mnemonic="F"> <JMenuItem icon="open.gif" text="Open..." mnemonic="O" accelerator="VK_O,CTRL_MASK" actionListener="OpenFile"/> <JMenuItem icon="save.gif" text="Save" mnemonic="S" accelerator="VK_S,CTRL_MASK" actionCommand="save" actionListener="SaveFile"/> <JMenuItem icon="save.gif" text="Save As..." mnemonic="a" actionCommand="saveas" actionListener="SaveFile"/> <Separator/> <JMenuItem text="Exit" mnemonic="x" accelerator="VK_X,CTRL_MASK" actionListener="com.bluestone.xml.swing.XwingMLExit"/>

XML -> Java Implementations
‡ XML -> Java
± ± ± ± Java-XML Data Binding (JSR 31 / Adelard) IBM XML Master (Xmas) Purple Technology XDB Breeze XML Studio (v2)

Adelard (Java-XML Data Binding)
‡ Java Standards Request 31 ‡ Still vapor! (?)

Castor
‡ Implementation of JSR 31
± http://castor.exolab.org

‡ Open-source

IBM XML Master ("XMas")
‡ ‡ ‡ ‡ Not vaporware - it works!!! Same idea as Java-XML Data Binding From IBM Alphaworks Two parts
± builder application ± visual XML editor beans

Brett McLaughlin's Data Binding Package
‡ See JavaWorld articles

Purple Technology XDB
‡ In progress (still vapor)
± Currently rewriting to use JDOM ± JDOMBean helps

‡ Three parts
± XML utility classes ± XML->Java data binding system ± Caching filesystem-based XML database (with searching)

Conclusion
‡ Java and XML are two great tastes that taste great together

Resources
‡ XML Developments:
± Elliot Rusty Harold:
‡ Café Con Leche - metalab.unc.edu/xml ‡ Author, XML Bible

± Simon St. Laurent
‡ www.simonstl.com ‡ Author, Building XML Applications

‡ General
± ± ± ± www.xmlinfo.com www.oasis-open.org/cover/xml.html www.xml.com www.jdm.com

± www.purpletech.com/xml

Resources: Java-XML Object Mapping
‡ JSR 31
± http://java.sun.com/aboutJava/communityprocess/j sr/jsr_031_xmld.html ± http://java.sun.com/xml/docs/binding/DataBinding. html

‡ XMas
± http://alphaworks.ibm.com/tech/xmas

Resources
‡ XSL:
± James Tauber:
‡ xsl tutorial: www.xmlsoftware.com/articles/xsl-byexample.html

± Michael Kay
‡ Saxon ‡ home.iclweb.com/icl2/mhkay/Saxon.html

± James Clark
‡ XP Parser, XT ‡ editor, XSL Transformations W3C Spec

Resources:
‡ JDOM
± www.jdom.org

Thanks To
‡ ‡ ‡ ‡ John McGann Daniel Zen David Orchard My Mom