You are on page 1of 102

XML Integration with Java

4hr 56m
Learn how to choose the right Java API for your application and get started coding with XML and Java.
In this course, author David Gassner shows you how to read and create XML strings and files, decide
whether to use a streaming or tree-based API, and find out which APIs are compatible with Android.
Plus, discover how to use both standard APIs that are included with the Java SE and EE distributions,
such as DOM, SAX, and JAXB, and learn about popular open-source libraries such as JDOM and the
Simple XML Serialization framework.
Topics include:

Choosing a Java-based XML API


Reading XML as a string
Comparing streaming and tree-based APIs
Parsing XML with SAX
Creating and reading XML with DOM
Adding data to an XML document with JDOM
Reading and writing XML with StAX
Working with JAXB and annotated classes
Comparing Simple XML Serialization to JAXB

Welcome
(Music) My name is David Gassner, and this is XML Integration with Java. In this course, I'll show you
a variety of methods for reading and creating XML formatted strings and files in the Java programming
language. I'll first describe how to choose an XML API. Whether to use a streaming or a tree-based
API, and which of these API's can be used with Android. Then I'll show you the code, starting with the
oldest API's sacks and dom.
I'll also demonstrate later additions to Java SC, that can streamline your code, such as Stacks and
JAXB. And I'll also help you get started with JDOM and the simple XML serialization framework. You
have many choices in how to work with XML and Java. I hope this course helps you choose the right
API for you application and helps you get started with it.

What you should know


This course is designed for software developers who work with the Java programming language. And
specifically, for developers who are working on applications that use XML in some way. XML can be
used for many different purpose, including working with structured data, exchanging data between
computers. And, in the world of the web, working with XML based web services. Whether they're
based on the soap protocol or on the rest standard. To get started with this course, you'll need a basic
understanding of Java.
You'll need to know your way around Eclipse, and you'll need to understand basic features of the
language. If you're new to Java or Eclipse, or you'd like a refresher on any of it's features, take a look at
the course Java Essential Training. The course uses Java 7, and depends on certain features of Java 7
that aren't available in earlier versions of Java. Or in the version of Java that's used in Android
development. So, if you'd like to learn about those features, look at the course Java Advanced Training.
The courses also assumes basic XML literacy. There are some very common terms of XML, and I'll be
using these frequently. For example, elements, attributes and CDATA sections. If you don't know what
these are, take a look at the course, XML Essential Training. And I've also included a movie early in the
course that goes through the basic XML terminology. So if you have a basic understanding of Java, and
you know your XML terms, you're ready to begin this course. XML Integration with Java.

Using the exercise files


This course is accompanied by exercise files that you can use to follow along with the demonstrations
that I do on screen. I've copied the exercise files to my desktop, but you can place them anywhere on
your hard disk. The exercise files are organized by chapter, one folder for each chapter of the course.
Within each of the folders, you'll find zip files. Each of the zip files is an Eclipse Project archive,
intended to be imported into Eclipse. There's one folder for each of the chapters for starting projects,
and then, under the Solutions folder, there's a set of folders for the chapters as well.
And within those folders, you'll find the Solution Projects, the finished versions of each exercise.
Additionally, there is an Assets folder. It contains a data folder with all of the data files that I use in the
course, including both JSON and XML files, and a Lives folder. This contains JAR files that I use in
the course. Whenever I use an external JAR file, I'll tell you where to download it on the web because
these are all open source free JAR files.
But I've included just the JAR files here for convenience. To make sure that you can use exactly the
same version of any particular JAR file that I'm using. To work through the course, I'm using Eclipse
4.3, or Kepler. And specifically, I'm using the version for Java developers. If you prefer, you can use
Eclipse for Java EE developers. It'll look different than this by default, but to make it look the same as
the Java version, switch your perspective.
Choose Window, Open Perspective, Other and then switch to the Java perspective. If you prefer, you
can work in the XML perspective, which is available from the same place. The XML perspective has a
couple of additional views. Named Documentation and Properties that are specific to XML. I'll be
using the Java perspective mostly throughout this course. To work with the exercise files, import them
into Eclipse.
Many of the exercise files depend on a project called data provider. That's a part of those projects' build
path. You need to have that project imported and open. To import that project, choose File > Import,
choose Existing Projects into Workspace and Select Archive File, browse, and go to the Exercise Files
folder to the 01_GettingStarted folder. And follow the prompt to import the DataProvider project.
I'll describe in detail what this project does, but briefly it has a few Java classes and a particular library
named Json-simple that's used throughout the course to get data that can be serialized into XML. This
project replaces a database or any other more complex data provider. Once you have the Data Provider
Project open, you can open other projects. So for example, I'll import something from the chapter on
DOM, or Document Object Model.
Now go back up to the exercise files, then down to 03_dom. And then, I'll import the project
DomCreateDocument. This project will require data. And so, in order to get it, you have to have the
data provider in the DOM create project's build path. For the most part, this binding will already be

done for you, but if you have any issues, here's how you can double-check it. Right click on the project
that needs to be bound to data provider.
Go to Build Path and then Configure Build Path. In the Projects tab, you should see the DataProvider
project listed. If you don't, click the Add button and add it to this project's build path. And then, you'll
be able to start working on that project. If you want to look at the solution for any particular project,
you can open it at the same time because it has a different name than the starting project. So I'll select
File > Import > Existing Projects Into Workspace. And this time, I'll go to the Solutions folder. Once
again to 03_DOM, and I'll choose the finished version of this project, DOMCreateDocument_Solution.
Now click Finish and I'll be able to look at the finished code for that project. So that's a tour of the
exercise files, how to import the zip files into Eclipse, how to make sure that you have the build paths
set up correctly, and how you can use the solution projects if you want to peek ahead.

Reviewing XML terminology


XML developers use a common set of vocabulary and terms to refer to the different parts of the XML
file. It doesn't matter which programming language you're using to work with XML. You might be
using Java, or CSharp or PHP. It's all XML. If you already have a deep understanding of XML, you
might want to skip this movie. But if you're new to XML or want a review, here are some of the most
common terms that you'll hear. The term XML Document is sometimes used to refer to an actual file.
But is also used to refer to the entire XML body. XML is a markup language. And as with all such
languages, it depends on the use of tags to describe logical parts of the XML structure. The XML
architecture supports three types of tags. A begin tag, starts with an angled bracket and ends with an
angled bracket, and then has the name of the element in the middle. An end tag has the same structure,
but has a forward slash right after the beginning bracket. If you have a begin tag, then you must have an
end tag. That's an absolute rule of XML. As markup languages go, that makes it different from HTML,
which can be more forgiving. An empty tag looks like this. There is a forward slash at the end, before
the closing bracket. And that means that this is an element without any content. An empty tag does not
have an end tag. Its essentially a begin tag, and an end tag, all put together in one thing. It's up to the
XML APIs to parse these tags for you. And return information about the logical parts of the XML. The
elements, the attributes, and so on. At the top of an XML file, you'll frequently see a bit of code that
looks like this. Starting with a bracket and a question mark, then xml. Then optionally a version and an
encoding value. Then ending with the question mark and the closing bracket. This is called the XML
declaration. It's optional, and you can have a well formed XML packet, without the declaration. But
you'll frequently see it appear. An XML element is the most common logical component of an XML
document. An XML element is divided into markup and content. For example, in this XML, there's an
XML declaration at the top, and then there's a customers element. This is known as the root element. It
contains, or is the parent of, all the other elements in the XML file. The next element is a child of the
root element. That's the customer element. And then the customer element has child elements as well.
In this example, elements with names of, name and phone. To represent data, XML typically uses text
nodes, or CDATA sections. Elements are said to have child elements or child content. In this example,
the customer has two child elements called name and phone. And each of those has a text node as a
child. These values are sometimes referred to as the child text node. And sometimes referred to as the
elements content. And still other times, referred to as the text value. It depends on which API you're
using. Different APIs use different terminologies, and see that text in different ways, You'll also
frequently see text represented with CDATA sections. A CDATA section is typically used where you're
dealing with longer text, or where the text values might have special characters. Such as, ampersands,
quotes and double quotes. These types of characters can cause problems for XML. And when they're
included in a text node, they have to be written out as values that are known as entities. I'll describe
those in a bit. But when you wrap text in a CDATA section, the text can use any characters. A CDATA

section starts with an opening bracket, and an exclamation mark. Then a square bracket, CDATA,
another square bracket, then the text value. Then a couple of closing squares brackets, and a closing
angular bracket. When you're using XML APIs, you don't need to type this XML text yourself. It's all
handled by the API. But you as a developer, need to know how to recognize CDATA sections when you
see them. Values can be stored in CDATA sections or in text notes. But they can also appear in
attributes. An attribute belongs to an element, and it's always placed in the begin tag of the element.
The XML specification says that attributes can appear in any order. And so when you're using XML
APIs, you'll typically refer to attributes by their name, and not by their position within the begin tag.
This is an example of an attribute. It has a name and a value. In XML, attribute values always must be
wrapped in quotes. And that distinguishes it from HTML where you'll frequently see values entered
without quotes, especially numeric values. XML documents can be validated against specifications that
describe what elements can be used, and what their data types might be. And what the relationship of
different elements might be to each other. There are two major architectures for describing this
information. An older architecture known as a document type declaration, or a DTD. You'll find DTDs
typically on older XML vocabularies. The DTD can either appear in the XML document, or more
commonly can be linked to the XML document. I won't be dealing the DTDs at all in this course. But
again, you should know recognize them when you see them. The more recent architecture for validating
XML, is called the Schema architecture. Schemas are defined by the World Wide Web Consortium, or
the W3C. But they are implemented by the tools you use to work with XML, such as Java APIs. You
define a schema in an XML document by pointing to a namespace string. And then optionally, using
prefixes to refer to those namespaces. Whenever you see an XML document that has something like
XMLns, it's referring to a namespace. The string you see might look like a webpage, but it doesn't
necessarily point to a webpage. It's simply an arbitrary string. And it's up to the XML processor to
decide whether that's meaningful or not. In this course, I won't be dealing with XML validation. I'm just
going to focus on reading and creating XML files, but I will touch occasionally on how to deal with
XML files that have name spaces and prefixes. Here are some other important XML terms you'll hear.
Encoding, refers to the text format of the XML document, or the unicode format. The most common
format is UTF-8. But you'll also see XML documents that support UTF-16. Your XML processor must
be able to deal with the encoding of a particular XML document. In this case the Java API that you're
working with. Comments are nodes in an XML document that contain text that can be ignored.
Comments are typically only for human eyes. And they look like HTML comments, starting with the
angular bracket, then the exclamation and a dash, dash, and ending with dash, dash, and the closing
bracket. An entity is a string that replaces a reserved character. There are five reserved characters in
XML and they're not all reserved everywhere. But one very good example of a character that's always
reserved, is the ampersand. The ampersand character is an illegal character unless it's wrapped in a
CDATA section. So to represent it, say in a text note, you'll see it written out like this. Starting with
ampersand, then amp then a semicolon. In both XML and in HTML, all entities start with an
ampersand and end with a semicolon. A processing instruction, is an instruction to an XML processor
such as an XML API. This is an example of a style sheet instruction that might be used by a browser,
that opens a set of XML content, and then applies a style sheet. And finally, the term white space refers
to spaces, tabs, and line feeds that separate elements. You'll see a lot of whitespace in most XML files.

Because one of the goals of XML is to make it human readable. And when XML is all compacted
together, it's a lot harder for the human eye to comprehend. That white space however is meaningless
when you're trying to interpret XML as structured data. You'll see in the Java APIs for XML, that many
of them let you ignore the white space automatically. But other APIs such as the older, simple API for
XML will report all of that text to you unless you explicitly turn it off. So that's a review of the
common XML terms that I'll use throughout this course. Again, the most common things that I'll be
dealing with are elements, attributes, and CDATA sections. But the more you know about XML
structure and terminology, the more effective you can be as a developer using XML in your Java
applications.

Choosing an XML processing API


Java developers have many choices in deciding how to work with XML in their applications. To choose
the right API, you should first decide what's important to you and your application. Some applications
just need speed, the best possible performance. To figure out which API will be fastest for you, you'll
need to take into account which platform or operating system you're using and the size and complexity
of the XML content you'll be working with. Other environments need to pay attention to memory
usage.
In some environments you might have plenty of memory but if you're building an app for mobile
devices, say for Android, you might be constrained. You should also pay attention to ease of
programming, both in the initial development in your app, and in long term maintenance. Some of the
older API's such as Dom, and Sax, can take more code and be more complex whereas, newer API's
such as JAXB, the java API for XML binding, can take significantly less code.
But you might also need it to work on Android and that will put certain limits on your choices. For
example, there are no current implementations for Android for the JAXB and StAX APIs. Here are the
different types of XML processors in Java. Typically, they break down into three categories. Tree-based
APIs, streaming APIs, and binding APIs. A tree-based XML processor represents the entire XML
document as a tree of objects in memory.
This gives you a lot of convenience, you can traverse the tree, forward it back, you can inspect one part
of the tree, and then jump to another part of the tree pretty easily. On of the great advantages of a tree
based processor is that you can search the XML content with the XPath expression language or with
tools that are specific to a particular API. But the downside of a tree-based processor is that it just takes
more memory, and certain tasks can be a lot slower than a streaming API.
Examples of tree-based processors include the document object model, and Jdon. Streaming processors
are designed to build or parse XML one node at a time. There are two kinds of streaming processors,
known as pull processors and push processors. The simple API for XML or Sax is a push processor.
That means it pushes the data into callback methods that you design. In contrast, the streaming API for
XML, or StAX, is a pull processor, where you loop through the data and only call methods that are
meaningful to you.
Typically, pull processors give you a more convenient programming model. But both types of
streaming processors can be incredibly fast, and highly memory efficient. The downside of a streaming
processor, is that because the complete data set isn't stored in memory at once, you can't do XPath's
style searches. And also decoding can be complex, especially for the simple API for XML or SAX.
Also the SAX API is a read-only API, it knows how to parse XML but not create it.
But as you'll see, if you want a streaming API for creating XML, you can use StAX unless you're
working on Android. And there's one other streaming API that's worth mentioning, called the
XMLPullParser. This is an API that's been implemented in android. So, if you like this streaming model

and you are working in Android, the XMLPullParser is one possibility. The binding processors are
similar to DOM in background that is their tree processors that store all the data in memory, all at the
same time, but the programming model is dramatically different.
To use a binding processor such as JAXB, or the simple XML serialization framework, you take Java
classes, POJOs, and you annotate them indicating which properties or fields of a Java class are mapped
to portions of your XML structure. And then you run very simple code to either serialize or deserialize
XML content. The upside of a binding processor is that it's a very efficient programming model and it's
very easy to maintain. And about the only downside is that JAXB, the binding processor that's included
with Oracle's JDK is not available in Android. But there's a binding processor that does work in
Android. It's called the simple XML serialization framework that's different than the simple API for
XML, which can be confusing. But it's an open source library that you can add to your Android apps,
and works quite well there. As you've seen so far the world of working in XML with Java is an alphabet
soup of acronyms such as DOM, JDOM, JAXB, and so on. One of the acronyms you'll see frequently
is J A X P, or JAXP. This stands for the Java API for XML processing and it's an umbrella term that
describes the standards for the XML APIs that are included in Java SE and these include these APIs.
SAX, the simple SPI for XML, DOM, the document object model, StAX, the streaming API for XML,
TrAX, the transformation API for XML and JAXP, the Java API for XML binding. So when you hear
the term Java API for XML processing or JAXP, you're not referring to a specific programming model.
It's the entire set of APIs that are available in Java SE without having to go and get a third party library.
If you're an Android developer, these are the APIs that specifically work in Android, SAX and DOM
are included in the Android run time. The XMLPullParser is also included in Android SDK although
it's not a part of the Java API for XML processing. And finally, third party libraries that work fine in
Android include JDOM, you'll need 2.0.1 or later, and the simple framework, which also needs a JAR
file. And the API's that don't work in Android are JAXB and StAX. And as you can see here there are
alternatives that give you similar styles of programming and similar benefits. And there are other XML
APIs for Java developers that I don't cover in this course. These include XOM which you can find at
xom.nu, dom4j which you can find at dom4j.sourceforge.net and XStream. I haven't included these
libraries simply to manage the length of the course, I had to make some choices. But there are
advantages and disadvantages to these APIs as well, and they're worth checking out. As you decide
which API to use for your application, you might want to do some benchmark tests. Finding out how
fast an API will be and how much memory it will use. Don't depend on the benchmark tests that are
offered by the vendors or by other developers, do you own. Test on the platform that's as similar as
possible to what your users will use. If your building an app for Android, test on a variety of hardware.
If you're building in a server environment, use the same server that you'll use in production on similar
hardware. Test with XML content that matches the size and complexity of the XML that you expect to
encounter. And for application server environments, such as Java EE servers, test multi-user scenarios.
Make sure you are using Java code in a way that works in a multi threaded environment that you'll
working in. And, finally do multiple test runs, don't depend on a single test. Do multiple runs through
each of your scenarios and then take the average. There are too many factors that can cause a single test
to not be representative of what you'll actually see in production. Through your benchmarks and

through your understanding of the relative ease or complexity of the APIs I'll be covering in this
course, you'll have plenty of choices. And you should be able to choose the API that's best for your
application.

Exploring the sample data provider project


In many of this course's exercises, I'll be using existing data and serializing, or marshaling, it to XML.
So, I'll need an existing data source. Instead of a full database, I've provided a set of JSON and XML
files. They're wrapped up in an Eclipse Project. In Eclipse, select File, Import, choose existing projects
into workspace and click next. Choose select archive file and click browse.
Go to the exercise files 01 getting started folder and choose data provider and follow the rest of the
prompts to import it. > The data provider project has a couple of classes, including a data provider
class, a customer class in the model package, and a stop watch class in the utilities package. I'll use all
of these classes during the demonstrations. And most importantly, here are the data files. The one I'll
use most frequently is customers.json.
It's a JavaScript object notation file that contains 1000 data items. Each item has an ID, a name, a
phone, about age balance, active and joined. And a variety of data types. A couple of integer numbers in
the id and age. A numeric value wrapped in quotes for the balance. A boolean value for active. And a
date for joined. There's also an XML version of this data set called Customers.XML. And one with
name spaces called NScustomers.XML. This has a name space with a prefix. There are also files
marked large. These files have 50,000 rows of data each and in fact if you try to open them directly in
Eclipse, you might cause memory errors. So instead open these files in an external text editor. I've set
up my system so that textpad is my standard text editor for XML files. And so I can right click on a
large XML file and choose open with, and then instead of choosing text editor or XML editor, I'll
choose system editor. And that opens the file in the system's default editor for XML files. If you're
working on Windows you can use Textpad, or if you're working on Mac you might try Textwrangler.
But either way for these large files these will serve you better than the built in editors in Eclipse. In
order to get the data to the other projects, there's a class called data provider. It depends on a library
called JSON Simple, which is free and open-source, and very small. Here's what the data provider class
does. First, it has some constants: a DATADIR string that indicates where the data is store relative to
the current project, and then three numeric constants, small, medium, and large. And you can use these
to indicate how much data you want to retrieve. There's a method called get data, which returns a list
containing instances of the customer class. We'll look at that class in a moment. The get data method
receives an integer argument named limit. If the limit is 50,000 or large, it opens the customers large
JSON file, and otherwise it opens customers.json that has 1,000 records. It then uses the simple JSON
library, using classes like JSON array, JSON parcer, and JSON object. To parse the JSON file and
retrieve its data. Then, there's some code to transform the JSON data into native Java data. For each
item in the JSON array, there's code to create an instance of the customer class, and set its properties.
The customer class is a plain old Java object. It has eight private fields, again with a variety of data
types. A set of string constants that match the names of the fields in the JSON and XML files. Getters
and setters and then down at the bottom of the class, a two-string methods that can output some of the

information about the current customer. The data provider class has a main method that only there for
testing the code. When you run this class directly, you'll end up running the main method, you'll
retrieve the data from JSON and receive it as a list of customer objects. And then you'll output some
information including how much data was retrieved, information about the first ten customers, and how
the long operation took. The timing is being by the class stopwatch which has a start and a stop method
and it uses some pretty simple java code to time the operation. The default call to get data uses the
small constant, meaning only ten data items are being requested. When I run the code, I retrieve the
data and display those ten items and I see how long the operation took. On my computer, 135
milliseconds. If I change the argument to medium and run it again, it takes a little bit longer, 264
milliseconds. And I see that I've received 1000 data items, although I'm only outputting the first ten.
And finally, if I put in a value of LARGE and run the code it'll take significantly longer. Because now
I'm opening a very large file with 50,000 data items. And I see that the retrival from JSON took almost
two seconds. So that's all the data files and the code in the data provider project. I'll show you how to
use the data provider project's code and data files as we get into some of the later exercises in this
course.

Reading XML as a string


In most of this course, I describe how to use a variety of Java-based XML APIs to handle the parsing of
XML files for you. But before we get into those APIs, it's a good idea to review how to read a simple
text file, from disk or any other location. I'll start by creating a new Java project. Selecting File>New>
Java Project and I'll name it XML strings. I'll click finish and then in the new project I'll go to the
source folder, right click and create a new class which I'll name main. I'll put it into a package called
calm.example.XML strings. I'll select the option to create main method and click finish to create the
class. Now, I want to use some of the capabilities of my data provider project. In the previous movie I
described how this data provider project works. It has a lot of useful tools including the data provider
class, which is static and has some useful constants and methods. And the stopwatch class, which can
be used to time certain operations. I'm going to add this data provider project to my XML strings
project in its build path. So I can use the data provider project's classes. Ill right-click on XML strings
and choose Properties, then Ill go to Java Build Path and click the Projects tab. Ill click Add, select
data provider and click OK and click OK again and now everything in the data provider project is
available to XML strings. I'll go back to that main class. My goal here is to simply read an XML file in
to memory as a string, and I'm going to create a string that I'll call file name, and I'll start it with a
constant of the data provider class named Data dir. I'll start it with the name of the class,
DataProvider.DATADIR, and then I'll append to that the name of the file I want to use, customers.xml.
I'll expand my editor to full screen, and next I need a way to collect the string one character at a time.
I'll use the StringBuilder class from java.lang. I'll create an instance of it named builder, and I'll use the
no arguments constructor method. Next, I need a class that can read the file from disk. You can choose
either file reader, or file input stream. Typically the reader class is better for pure text. So, I'll create a
file reader object. Which I'll name reader, and I'll instantiate it by calling new file reader. And I'll pass
in an instance of the file class and there I'll pass in the file name. And error appears, so I'll add a quick
fix. Any clips on windows press Ctrl+1 and on Mac, press Cmd+1. And I'll chose add throws
declaration so I can keep the code as simple as possible. To keep my throws clause as simple as
possible, I'll have it throw the exception class. And I won't need to refer to any of the sub classes of
exception. So now I have a string builder to collect the string and a file reader to read the file. I'll create
an integer variable that I'll name content, and then add a while loop based on a condition. The condition
will look like this. On each time through the loop, I'll call the reader object read method, and it'll return
one character as an integer. I'll start with another pair of parentheses. And I'll use this statement, content
equals reader.read, that returns either the next character in the file or, if you're at the end of the file, a
value of negative one. So I'll evaluate the response with not equal negative one, now each time through
the loop, I'll receive a single character as an integer. And I'll take that value and append it to the string
builder, using builder.append and I'll pass in content. But to make sure that that character shows up as
an alphanumeric character, I'll cast it to char. Once the loop is complete, I'll close the file reader object.
And then I can get my string by calling the file reader objects to string method. I'll do some system

output. I'll type sysout and press Ctrl+space and output builder.toString. And that's all the code you
need to read the file from disk and put it into a string value. I'll test the code by clicking the Run button,
and there's the result. I'm now putting my XML file to the console. And that's great. But again, it
doesn't accomplish the goal of getting structured data from an XML file and turning it into a set of Java
objects. That's what most of this course is about. And you'll find as we get into the various XML APIs
that I cover in the course that they all protect you from the internals of file readers and file input
streams. You'll never really have to do this kind of looping and reading one character at a time, because
the APIs handle that for you. The emphasis will always be on examining the elements of the XML file,
or the underlying data structure. So let's get started with the first API, known as the simple API for
XML. And I'll describe that in the next chapter.

PART 2: PARSING XML WITH SAX (SIMPLE


API FOR XML) 45m
How SAX works
The first API I'll describe in this course is SAX or the simple API for XML. SAX is a streaming API.
It's a read-only API so you can use it to parse XML content, but it does not have the ability to write out
or serialize XML. SAX is an event-based parser. As it reads an XML file, it emits events, and then you
capture those events with your own code. It's one of the very earliest XML APIs, and it's called the
Simple API for XML, because when it was created, it represented a much simpler approach to reading
XML than hand-parsing a plain text.
It was originally created by David Megginson, but it's a completely open source API. And in fact, it's
bundled with pretty much all versions of Java. The SAX API, as a streaming processor, is much faster
and can use much less memory than a tree-based processor such as DOM. There are two kinds of
streaming processors, push and pull parsers. SAX is a push parser. That means that the primary control
of the parsing process is handled by code that you don't own as the developer.
As that process reads the XML content, it pushes the data into your custom code that's encapsulated in
callback methods that you define. SAX works fine on Android, and in fact is bundled as part of the
Android runtime. Some developers like to use it, while others prefer the XML pull parser that's also a
part of Android. Both are streaming processors, but they represent very different coding models. All
streaming proccessor have certain benefits.
A streaming processor is a forward only processor. It only knows how to go from the beginning of the
XML file to the end of the XML. As the XML is read into memory, the processor emits events to share
the data with the developer. But after each event is handled, the data that's associated with that event
can be discarded from memory by the processor. So, streaming processors are capable of handling very
large XML content. The entire document doesn't have to be in memory all at the same time.
In order to read XML with SAX, you'll deal with events to read the data. As the SAX parser moves
forward through the XML content It will emit an event for each significant node in the XML file. Some
of the most commonly used events to get the data from XML include the startDocument and
endDocument events, startElement and endElement, and the charactersEvent, which reports when
usable text is available.
There are also error handling events, including warning, error, and fatalError. These events are handled
for you by the super class if you ignore them, and a fataError is just that. It'll stop the processing in its

tracks. If you want some custom handling of the errors, you would override the methods for these
events. Other events that are available include notations, processing instruction, ignorable white space
and entities. I won't to cover these additional events in this course but they're available if you need
them for more complex XML content. To work with SAX, you'll create a custom Java class that
extends a class called Default Hander. This is the super class for your event handler and it has
implementations of each of these event methods I described, such as start document, end document,
start element, and so on. When you extend the default handler class you inherit all of its methods. And
then if you want to handle any particular data, you override those methods and create your own custom
code. So this is an example of a start document method. As the parser object starts to read the XML
content, it'll call this method. And many of these methods will receive arguments that give you data. It's
up to you to design your code to capture the data and save it in some way. In this chapter's movies, I'll
show you some strategies for doing that. To launch the parsing of a document, you'll create an instance
of a class called SAXParser. And, you'll create an instance of your handler class. Then, you'll call the
parse method of the parser. When you call the parse method, you can pass in a file, an input streamor a
number of other sources. And then you pass in your handler object. The parse method then does its
thing: reading the XML file and calling the methods of your handler object. Here are some things to
know about the SAXParser. As I mentioned, it's up to you to figure out how to track the data. Each of
the event call back methods is called individually. There's no automatic sharing of data between those
methods. So you'll need to create fields in your handler class to store data as it's collected. Again, I'll
show you some strategies for this. Another thing to watch out for is that the characters event in SAX
can be called more than once, even if there's only a single text node. One of the most common things
you'll see is that if a text node has an entity, such as ampersand AMP semicolon, some SAX processors
will call three characters events, one for the text before the entity, one for the entity itself, and one for
the text after the entity. So it's up to you to design code that can capture that text for each event and
then concatenate it together. So those are some things about the nature of this SAXParser. In the next
set of movies, I'll show you some sample code for parsing XML files with SAX.

Creating a SAX event handler class


The first step in parsing an XML file with simple API for XML, is to create an event handler class. I'll
show you how to get started with this in the project SAXEventHandler. In this project, there's a main
class called ReadXMLWithSAX. It has a main method which has a file name variable which is
constructed from the data provider's dated Re: constant and the name of the file customers.XML. The
main method has a throws clause with exception because we'll be dealing with some exceptions as we
deal with simple API for XML.
There's also a plain old Java object, the customer class. This version of the customer class has just the
setters and getters that are needed, the eight private fields, the constants for the names of the elements
in XML or in JSON, and down at the bottom, the two-string method to output the customer
information. To read data with sacks, you create a class and extend the class called Default Handler.
And we'll do that with this class. SAXCustomerHandler.
Right now this class has two private members, a list of customer objects named data and a constant
called XML date format. We'll get to that later. It also has a method called readDataFromXML. That
receives the name of the file to be read and returns the data object. To implement the simple API for
XML, take this class and add an extends clause. An extended class called DefaultHandler, which is a
member of the package Org.xml.sacks.helpers.
When you select the class, any import statement should be added at the top. Now, here's how SAX
works. The default handler class is a concrete class that has a set of default methods, but in order to
handle any of the events that happen as an XML file is read, you override those methods. So now that
we've extended the class, we can add the appropriate overrides, and I'll start by adding five overrides.
I'll place the cursor after readDataromXML, the method, and press control space, and I'll select the
Start Document method.
That creates an override version of that method. Within the method, I'll get rid of the comment and the
call to the super classes version of the method, and I'll replace it with some system output and I'll
output the string, Start document. Now I'll do the same thing and add an override of the end document
method. I'll press Ctrl+Space. And I'll type end, and choose end document. And then I'll add some
console output the same way I did with start document.
With sys out and then the string end document. Now I'll do the same thing for three more events. I'll
override the start element method. There are a couple of versions. And I'll choose this one, which
receives four arguments named URI, local name, queue name and attributes. I'll once again remove the
comment and the call to the super classes method. And now here's what's going to happen. As the
SAXParser encounters a start element, it'll trigger a call to this method.
And it'll pass in the name of the element and a collection of attributes. For an XML file without any
namespaces or prefixes, the name of the element will be in the qName argument. So I'll add some
console output, and I'll output Start element and then I'll append to that qName. Next I'll add an

override for the end element event and I'll do the same thing. I'll copy and paste that output code. And
I'll change the label from Start element to End element.
Finally, I'll add an override for the characters event. The characters event is triggered whenever a string
of characters is encountered. This could be white space, that is, spaces, tabs and line feeds, or it could
be meaningful data. And it's possible for the character's event to happen more than once between a start
element and an end element. Say that you have a string consisting of plain text plus an entity, such as
an ampersand, and then some more plain text.
In some environments that could trigger the characters event three times. For this exercise we won't
worry about that, I'll just override the event and then add output to say that the event happened. So now
we have a useable event handler for SAX. The next step is to go back to the method
readDataFromXML, and add the code that will actually parse the document. We'll use two classes here
named SAXParserFactory and SAXParser. Start with the factory class.
I'll type the beginning of the class name and I'll choose SAXParserFactory and I'll name this object
factory, and I'll get its reference by calling a static method. SAXParserFactory.newInstance. Now, at
this point, I could change the behavior of the factory by calling one of these methods. There's a method
called set name space aware, one called set validating, another called set feature, and so on. But, I'm
not going to do that. I'm just going to use a default factory object.
and I'll use it to create a parser object. I'll declare a new object named SAXParser, and I'll name it
parser, and I'll get its reference with the method factory.newSAXParser. And now, I have a parser
object. The next step is to tell the parser object to parse the file. I already have the file name, its being
passed in here as an argument of the read data from XML method. So I'll wrap that in a file object and
pass it to the parser. With this code, parser.parse, and for the first argument, I'll pass in a new file
object, and wrap it around the file name.
Be sure to add an import for the file class. The next argument is the eventHandler object, which must
extend defaultHandler, and this class is the one that's doing that, so I'll simply pass in this. Meaning use
the current object to manage all the events that the SAXParser will emit. Now I have some error
indicators, so I'll deal with them by pressing Ctrl+1 here, for a Mac press Cmd+1.
And I'll add a throws declaration to the method signature. And notice that there are two possible
exceptions, SAXException, which can be thrown by the SAX Event methods and the IOException,
which can be thrown by the file class. I have an error on the next line too and I'll throws declaration for
that and I'll get a ParserConfigurationException. So those are all the exceptions that can be thrown by
the code I have so far. So here's what this class is doing so far.
It has a public method called readDataFromXML, which receives the file name. It creates the
SAXParserFactory and the parser, and then parses the file. As the file is parsed, the parser object calls
all of these other methods as call back methods. At the beginning of the document, it calls the start
document method. At the beginning of each element and the end of each element, it calls those
methods, and so on. So now we'll go back to our main class, ReadXMLWithSAX, and we'll call this
class and this method.

I'll create a new instance of the class, SAXCustomerHandler. That's the one I was just working on, and
I'll name it saxhandler; and I'll instantiate it by calling it noargumentconstructor. Then I'll call the
objects readDataromXML method, and I'll pass in the file name that's already been defined above. I'll
get rid of this suppressed warnings annotation. I don't need that anymore. And now I'm ready to save
and test the code. When I run the code, I see a whole string of output in the console.
The top of the output has already been lost, but if you scroll down you'll see a pattern emerge. Here's a
start element for the customer element. Some characters which would be white space. Start element for
name and some characters and then End element for name. Start element for phone, some characters
and End element. And again you'll see a bunch of characters events happening which are triggered by
white space between End elements and Start elements. In the next exercise, I'll show you how to write
the code that can figure out when the character's event is meaningful and when it isn't.
And how to track all this information and store it, so that you can put it into a form that's meaningful
for your java-based application.

Tracking XML elements in SAX handlers


In a previous movie, I've showed you how to create a sax event handler class that has override methods
for events like start document, end document, start element, end element and characters. The next step
is to start tracking the data as it's collected. One good strategy is to create objects that are fields of the
current class. And then you can use those objects to track your data. In this version of the project, in
Sax customer handler, I've already declared a list which will contain instances of the customer class.
I want to initialize that list when the start document event happens. That'll make sure that if this object
is used more than once that we're always starting off with a fresh list. So I'll go to the start document
event, and we're going to comment out the existing code. And I'll initialize the object with data equals
new, and I'll use the concrete class, array list. Now, I have a place to put customers objects as the XML
file is read into memory. The next is to figure out where I am in the document at any given time and for
that I'll need a string variable that I'm going to call current element.
I've declared as a member of the class and I'll place it up right here, under the list. I'll declare it as
private and its data type as string, and I'll set its name to currentElement and I'll initialize it to an empty
string. Next, I'll go down to the start element event, and this is where I was outputting the element
name. Once again, I'll comment that code out that's outputting the name to the console. And here, I'll
set the value of the current element field that I just declared to qname.
And that's the argument that contains the name of the current element in an XML file without name
spaces. Then, to clear that information, I'll go to the end element event, I'll comment out my console
output and I'll reset current element to an empty string. My goal is to know where I am in the XML file
at all times. The only time I'm going to care what the element name is, is when I've collected a certain
amount of character data, but I'm also going to need a place to put data as it's collected.
And to do that, I'll create an instance of the customer class. Once again, this object will be shared
among multiple event methods; so, I'll add another field at the top of the class. Once again, it's private,
and this is an instance of the customer class that's a part of the current project. Now notice I have one
over in the data provider project too. It has a sub package of model, don't use that one. Use the one with
the package com.example.sacks.read, that's part of this project.
And I'll name the object simply customer. Then I'll go back down to my start element event. And I'll
add a little bit of code here. I'll add a switch statement and I'm going to evaluate the current element.
The field that I just set from the qname argument. I'm working in Java 7 so I can use a switch statement
on a string. If you were doing this in Android, you'd have to follow another programming model. I'll
Evaluate Current Element And first, I'll look for the name of the Root Element of the XML file.
That's Customers. And if that's the value of the current element, I'm just going to ignore it. I'll issue a
Break statement, and continue on. Now I'm going to duplicate this bit of code. And for the second case,
I'll look for the Customer element. When I get to the start element for customer, I care, because now
I'm starting a new data item. And I'm going to follow some simple steps, first I'll initialize that

customer object, that I declared as a field of the current class.


I'll say customer, equals new customer. And next I'm going to get my first bit of real data, the customer
ID. The ID is an attribute of the XML file, and specifically, it's an attribute that's part of the customer
element. When the start element event happens, the method receives this argument, the attributes
argument. And it's an instance of the attribute's class. I'll create a string variable and I'll name it
idAsString and I'll get its value by calling the attributes object's get value method.
Notice that there are a few versions of this. I'm looking for the one with the name id as a string. Now I
could type it in like that. Or I can use a constant that's a part of my plain old Java object. The customer
class, and for consistency I'll do that. Using customer.id, so now I have the ID as a string value, and I'm
going to need to parse it as an interger, and then pass it into the customer object. The customer object
has a setter method called setId that accepts an integer.
To turn the value from a string to an integer, I'll call integer.parseInt and I'll pass in ID as string. So
now I have a new customer object and its ID property has been set. And finally, I'll add the customer
object to my list which is named Data. So, to review the strategy, I've declared the customer object as a
field of the current class, that persisted for the life time of the class. When I get to a new customer, I
initialize it, as a new instance of customer.
I know I'm in the right element so I know that that ID attribute is there. I call the attributes objects
getValue method, pass in the name of the attribute, and get its value back as a string. I transform it into
integer by parsing it, and then I add that value to the customer. And finally, I add the customer object to
the list. And that's all the data that I'm going to collect in this exercise. Now I'm going to go back to my
main class, which is calling my event handler class. That's read xml with sax.
And I'm going to modify the main method in this class. I'm going to create a new list of customer
objects. I'll name it data and I'll get it's value by calling read data from xml. Then I'll do a system
output. And I'll output number of customers, and I'll append to that the size of the list with the
expression, data.size. And let's see how we did. I'll run the code, and at the end I'm told that there are
1,000 customers. But we can do a little bit better than that.
Now that we have all the customers, we can output some information about them. I'll go to a new line,
and I'll add a foreach loop. I'll type foreach and press control space, and choose the foreach code
template and then, within the for loop, I'll use system output and I'll output customer. And then I'll
append to that customer.getId. And this is the ID field of the customer object, which was set from the
attribute in the XML file.
I'll run that code and there's the result. I'm getting the numeric values from the XML file that are turned
into integers for data storage in memory. But turned back into strings for output. So, here's what we've
accomplished so far. In our handler class, we have fields that we're using to track where we are in the
file. We have a customer object to track each data item. And we have a string named current element
that we're using to track which element is being currently parsed. The next step is to start collecting text
from the XML file.

And that takes some special strategies. So I'll show you how to do that in the next movie.

Capturing text values in SAX handlers


So far, my work with SAX has involved creating the event handler and adding some fields to the class
that let me track data and keep track of where I am in the XML file. The next step is to start capturing
more data. In my handler class, in the StartElement method, I have a switch statement. Where I've
handled two of the elements that occur in this XML file, customers and customer. Going back to the
XML file, I'll see that all of the other elements in this file, such as name, phone, about, and so on have
actual data as text elements.
Most of them are simply text, which is known, in XML vocabulary, as a text node. But the about
element has something called a C data section. When you are working with SAX though its important
to know that its possible for there to be multiple text nodes within a single element. And each of these
can trigger its own characters event, so to track text its not as simple to saying just give me the text of
the currentElement. You have to first start by knowing you've hit a StartElement event, and you have to
start capturing text there, and then, each time a character's event happens, you have to append that text
to the text you started collecting.
And then, in the EndElement event, you have to do something with that text. So, let's go back to our
handler class. We're, working with a new version project named SAX capture text. And, it picks from
the last exercise left off. The first step is to create an object that we can use to collect text. I'll go back
up to the top of my handler class. And I'll add another new field, I'll declare this private and I'll use an
instance of the StringBuilder class, and I'll name it current text.
Now, each time I hit a StartElement event, if I'm in an element other than customers or customer, I'll
initialize that string builder. That is, I'll create a new instance so, I can start collecting text from scratch.
I'll place that code here. In the default section of the switch statement. Before the break statement, I'll
say currentText equals new stringBuilder. And now essentially I'm starting with an empty string. Next,
Ill go to the Characters event.
Again, this event can happen more than once. In most cases, its only going to happen once, but you
want to be on the safe side. First, I will add an if statement, and I'll set the condition to ask if current
text doesn't equal no. The characters event passes in a character array. A starting integer and a length
integer. And to append the current characters value use this code. CurrentText.append and use this
version of the append method that accepts those three types of arguments.
A character array, a start value and a length value. And pass in the arguments that you received in the
event method. I'll comment out the code that's outputting the string characters to the console, and now,
each time I hit these event, I'm appending the text to the current text object. Finally, we'll deal with the
text in the EndElement event. In the EndElement event, I'm told which element I'm leaving. So, I'll
place the code here before I set currentElement to a blank string.
And first, I'll check to see whether I'm leaving customers or customer, and if so, I'll just return from the
method. I'll add an if statement, and I'll say if currentElement.equals, and I'll pass in a value of

customers, the rootElement. Then I'll put an OR operator and I'll see if currentElement as a value of
customers and if either of those are true, I'll simply return from the method. Next, I'll create a spring
variable that I'll name content, and I'll get its value from the spring builder that's declared as the field of
the top of the class that's where I've been collecting my text value.
Now I'm ready to do something with that value. And once again, I'll use a switch statement. I'm
checking the currentElement before I leave it. So, that's the expression that I'll evaluate. And now, I'm
going to look in turn for each of the elements that's a child of customer. And for each of them, I'll take
the current text value and add it to the current customer object. I'll start with the nameElement. Just as I
did with the id, I'll use the constant of the customer class. This have a value of name as a lowercase
string, which matches the element names in the XML file.
If that's the case, then I'll take the content value and add it to the customer object, using the setter
method SetName. And I'll pass in content. Now, I'm going to take that bit of code and copy it, and paste
it in a few times. And for each of these I'll change the name of the element I'm looking for and the
method that I'm calling. And I'm dealing first with all the string based properties. Next I'll deal with the
phone, and I'll call the setPhone method to pass the value in.
Then I'll deal with the aboutElement, which has a CData section instead of a text node, but it's going to
behave the same as the text nodes. And those are the three string based values. Next I need to deal with
some other data types, I will start with the age which is an integer. I will look for the ageElement, I call
the set age method and I get an error and that's because this set age setter method is looking for an
integer. There're few ways to deal with this, for example I could add a new setter method to the
customer class.
But instead I'm simply going to parse the value and turn it into an integer here. I'll use integer.parseInt
and I'll parse the content variable. And now that method works. I'll do something similar with the
ActiveElement. This is the name of the element, customer.active. And the name of the setter method,
set active, and now I'll once again parse it, but this time I'll use Boolean.parseBoolean. The balance
field is a big decimal and it requires a slightly different coding model. Once again I'll paste in a case
statement. I'll say I'm looking for the balance. And this time, instead of a parse method. I'll use the big
decimal classes constructor method with new big decimal and I'll pas in the string value of content.
And, that's because this class has a constructor method. It knows how to do that. I change the name of
the method I'm calling to set balance and that works fine. So, now I've handled seven of the eight fields
and the last is the date. That takes a little bit more work. Once again I'll add a case statement and this
time I'll look for joined, that's the name of the element containing a date. But now I have to parse the
date and turn it into a Java date object. To do that, I'm going to use a value that was already a part of
this class, called XML date format. It has a string that matches the format of dates in my XML file. If I
go look at the XML file, I'll see that all of the dates have a format of the year's four digits, the month,
and the day. A T, in uppercase, and then the time, which is all zeros, and that's matched by this format
mask. So, going back down to this case statement, I'll create an instance of the Java date format class.
It's a member of the package java.txt. I'll name it df and I'll substantiate it with this code, new
simpleDateFormat and I'll pass in that string of XML date format. Now, I've a date format object that

understands that date format. Next, I'll call customers.set joint. But instead of passing in the simple
string of content. I'll pass in the date format object parse method wrapped around content. I'm shown
that there's a potential exception called parse exception. So, I'll use a quick fix and surround that with a
try catch. I'll get rid of that comment. And now my eventHandler class is pretty much complete. I'm
capturing all of the data in the EndElement event, not in the characters event because I might have
partial values there. Now I'll save that and go back to my main class, read XML with SAX. Instead of
outputing just the customer ID, I'm going to output the customer object itself. THe customer class has a
two string method which will output a certain amount of data. And, I'm ready to test. I'll run the code.
And, I'll see for each data item that I get the ID then the name of the customer, and then the date they
joined. And the format of that output is determined by this method here. the toString method at the
bottom of the customer class. So, now we've successfully parsed an XML file using the simple API for
XML. We've set up an eventHandler class. We've handled all the events we care about, EndElement,
start document, and so on. And then we've used Java objects to collect the data. And it's all incredibly
fast. Because SAX is a streaming API, you can manage very large XML files and not worry about
running out of memory or other resources. There's still some code to do, though. You'll need to add
code if you're going to deal with name spaces in XML files and you'll also want to add dome error
handling code. And I'll show you how to do both of those in the next set of movies.

Handling namespace strings and prefixes with SAX


So far, I've described how to use simple API for XML to parse a simple XML file. That is an XML file
without namespaces. This XML file has simple element names such as customers, customer, and so on.
And no name spaces or namespace prefixes. When you work with this sort of XML file, when you get
to one of the element handlers, start element or end element, you'll get three arguments named uri,
localName, and qName.
And with this sort of simple XML file, both the uri and the localName will be blank strings. The
element name is contained in the qName argument and that's why we're assigning it here to the current
element variable. But now let's see what happens when we try to parse an XML file that does have
namespaces. I'll use this file, NSCustomers.xml, which is in the Data folder of the data provider
package. This is pretty much the same data. It has the same data structure, but it now has a namespace.
The namespace string is here and it's assigned to a prefix of cust, c-u-s-t. I'll go to my class, read XML
with SAX.java, and I'll change the name of the file that I'm parsing to NScustomers.XML. And now, I'll
just run the code and see what happens, and I get a response saying that there's no data in the
document. The problem is that I now have a collision between the way I coded my class, and the
structure of the XML file. Let's do some debugging.
I'll go to SAX customer handler java, and I'll add a break point on the line that's assigning the current
element variable. I'll double click next to the line number, then I'll go back to read XML with SAX, and
I'll click the Debug icon on the toolbar. When I get to the break point, I'll be prompted to switch to the
debug perspective. I'll click Yes, and that takes me to the point in the code where I have my breakpoint.
Now, I move the cursor over the three arguments. Uri is still blank, localName is still blank and qName
now contains the complete element name including the prefix.
Essentially, there isn't any parsing of the namespace. No recognition of the relationship between the
prefix and the namespace string. To get that capability, you need to turn on a feature of the SAX parser,
called namespace aware. So I'm going to terminate this process. Go back to my Java prospective. And
go up to the top of the code in SaxCustomerHandler. Here's where you make the change. Place the
cursor at the end of the line where you're creating the factory object.
Make a new line after that, and add this code, factory.setNamespaceAware and set the value to true.
Make sure you put this code before you create the parser object. If you put it after the line that creates
the parser object, it won't be applied to the parser you're creating. I'll save that change, go back to my
main class, and debug again. When I get to the break point I'll once again take a look at the arguments
that are being passed in. And now the uri contains the namespace string, the localName contains the
element name without the prefix, and the qName still contains the entire element name with the prefix.
So now I know how to fix my code. Instead of looking at the qName argument, I'll look at the
localName argument. So I'll terminate this process and I'll change this line of code so that instead of
examining qName I will examine localName. I'll save that change and once again go back to my main

class. And this time I'll run the code. And in the console, I see that I get back my expected response,
that I have 1,000 customers, and I'm listing all of the customers in the data file.
Once you've changed the parser feature so it can handle namespaces, it becomes possible to add
conditional code. So that if you're working with an XML file where there are naming collisions, that is,
more than one element of the same name, but with different namespaces, you can add conditions to
examine the uri. Let's try that. I'll go back to my Java perspective, and I'm still in the startElement
method. And I'll go down to the case for the customer.
Let's say for example, that there are two customer elements in this file. But I'm only interested in the
one with the uri that I showed in the XML file. Well, I'll go back to the XML file and I'll select and
copy that URI to the clipboard. Then I'll go back to the handler class, and I'll add an if condition within
the case statement. I'll set the condition as follows. If uri.equals, and then I'll paste in that uri.
And then I'll take all this code, before the break statement, and move it inside the conditional block. So
now I'll only be creating a new customer object when I'm looking at a customer that's a member of the
correct namespace. And you can do the same sort of thing in the endElement event method, which also
receives these same three arguments, the uri, the localName, and the qName. I'll add another condition
to this if clause. Right now I'm making sure that the current element doesn't equal customers or
customer, but I'll also make sure that the current element is a member of the correct namespace.
I'll add an exclamation make and then uri.equals, and once again I'll paste in that namespace string. So
now I'm saying if the current element isn't customers, and it isn't customer, and if the uri does equal this
namespace, then I can continue with the rest of this code. I'll save those changes, go back to my main
class and run without debugging again. And once again, I've successfully collected the data. So, that's a
look at how you can deal with XML files with namespaces.
Even if you don't care about the namespaces, you need to handle the namespace strings correctly. You
do that by settings the feature of the parser with set namespace aware to a value of true, and then
changing the arguments that you are examining in the startElement and endElement methods.

Handling parsing errors in SAX


One of the great advantages of a streaming API such as simple API for XML, is that, if you encounter a
problem in an XML file, such as a tag name that doesn't match, you can still collect data up to the point
of the problem, whereas with the tree-based API such as document object model, the entire file will
become useless. I'll show you how to handle simple errors in a project named SAXErrors. This version
of the project goes back to reading simple XML files without name spaces.
To create an error, I'll go to customers.xml in the data folder of the DataProvider project. I'll copy it to
the clipboard. And then I'll paste a new copy into the same folder. And I'll name this new version of the
file customers error.xml. Now I'll close the version that I currently have open and open the new
version, customers error, and I'll introduce a problem into the XML file. I'm going to go down to the
third customer element that has a name of Lynda Byrd, and I'm going to change the end tag for the
customer element from customer to wrong name.
Now because SAX is a streaming API, it's not going to realize there's a problem with the XML file until
it encounters this tag. So, it's going to successfully parse the first and second customers completely. It'll
start parsing the third customer, but then when it hits the end element event of this customer element,
it'll trigger an error. I'll save that change, and I'll go to the class read XML with SAX.java from the
current project.
And I'll change the name of the file that I'm parsing to customersError.xml. Now I will just try running
the code and I get a big old error in the console. It tells me that there is an exception in the thread and it
tells me explicitly where the problem is, where the file is, which line number the problem is on and
exactly what the error is. That the element type customer must be terminated by the matching end tag
customer. So now how do you handle this in the Java code? The SAX customer handler class for my
project extends to default handler class.
So far in this class I've added overrides of methods that are used to read data. Start document, end
document, Started an element and characters. To handle errors, I'll add a few more overrides. I'll go
down to the bottom of the class, and place the cursor after the characters method. I'll press Ctrl+space.
And I'm going to add three new methods. The first will be called warning, I'll press Ctrl+space again.
The second will be error, and the third will be fatal error.
For each of these method overrides, I'll remove the comment and the call to the super classes version of
the method. And for the moment I'll just add some system output. I'll output warning for the warning
method, error for the error method, and fatal error for the fatal error method. I'll save my changes, go
back to my main class, and run the code again. And I see that I'm encountering the fatal errror event.
And this is what happens when you deal with a malformed XML file.
You'll always get a fatal error, because the SAX parser can no longer continue forward reading the rest
of the file. But remember, I only introduced that error into the XML file at the end of the third data
item, so it still should be possible to collect some data from this XML file. So I'll go back to

customerHandler.java, and I'm going to add some code right here, where I'm calling the parser objects
parse method. First, I'm going to remove the SAXException object from the throws clause of the read
data from XML method.
That results in showing two errors. One where I'm creating the parser object and one for end parsing
the XML file. So I'm going to wrap all of this code with a try catch clause. I'll select these three lines of
code that are creating the factory object, the parser object, and parsing the data. Then I'll right click and
choose Surround with Try Multi Catch Block. Because I'm already handling the other errors here in the
throws clause, my catch only catches SAXException.
In the catch block, I'll remove the comment and the stack trace and I'll replace that with just outputting
the error message. With sysout, e, dot get message. Now because I've handled the error in the catch
block, the code will continue executing. And I will return whatever data I collected. I'll save my
changes and come back to the main class and run the code and now in the console, I see that I got a
fatal error, I also see the error message that the customer element was not terminated correctly, but I do
retrieve the three customer objects that I was trying to retrieve.
If you debug more carefully, in some cases you'll find that, that last object is incomplete. In this case, it
should have all the data that's expected because we didn't get to the problem in the XML file until the
end of the customer object and all of the child elements should have already been read correctly. And
not all applications will be okay with retrieving just some of the data. But this sort of code takes
advantage of this capability of a streaming API to return some of the data up to the point where a
parsing error is encountered.

Part 3: Creating and Parsing XML


with Document Object Model
1hr 12m
How DOM works
The next API I'll describe is called Document Object Model, or DOM. Like SAX, the DOM API is one
of the oldest available APIs. It's a tree-style XML processor, which means that it stores its data as a
hierarchical tree of objects in memory. Document object model is a read and a write API. You can both
parse and create XML content with it, and it's an implementation of a non language-specific
recommendation from the Worldwide Web Consortium, or the W3C.
You can find the technical documents for DOM at this web address. As I mentioned, its one of the
earlier APIs for java and there have been a lot of improvements on the DOM programming model,
including the open-source library, JDOM. But, if you want to understand the nitty gritty of how XML is
understood across the computing industry, knowing how to program in DOM is a must. The DOM API
for Java is included in all current versions of the Java SDK.
And it's available in Androids built into that run time, but it can be memory intensive, so it's not always
recommended for use on mobile devices. Someone of the benefits of tree based processors such as
DOM include that becasue the document is stored all at the same time, it gives you great search
capability. DOM based documents can be queried with the XPath expression language and there are
certain methods of the node class. One of the key classses in the DOM API.
That let you look for data without having to navigate down through the hierarchy one level at a time.
Some of the common issues with DOM have to do with how it stores all of the data at the same time in
memory, if you need to deal with very large XML documents, you might find that they exceed
available resources. You can run out of heap space. And you can sometimes solve that by adding heap
space to your Java Virtual Machine, but in some cases you'll be better off using a streaming processor.
Also, because DOM is holding everything in memory at the same time, typically it's slower than a
streaming processor at run time. And finally, compared with certain, more modern APIs, some
developers find the DOM programming style to be cumbersome. When you program with DOM, you'll
find that you're using a lot of interfaces, and the actual concrete classes that are being created in the
background are not exposed to you as the developer. So there are multiple layers of objects and
multiple layers of references going on at the same time.
JDOM, as an example, cleans this up significantly by using concrete classes for the most part, where

DOM uses interfaces. To parse XML with the Document Object Model, you start by reading the XML
content into memory. Here's an example of reading a file. I'm starting off with the file object and then
declaring a document object. Then, within a try-catch block, I'm creating an instance of a builder. Just
like the SAX API, the DOM API uses the factory design pattern.
First, you create the factory, then you can modify its features and then create a document builder. Or
you can combine all that into a single statement as is done here. Then you use the builder objects parse
method. The parse method can accept a file, an input stream, or a couple of other different kinds of
sources. Once the document has been parsed, and the entire thing is in memory, you can then traverse
the document, forward and back, or search its contents, and here a couple of methods you can use.
The node is the super interface for all the interfaces representing different parts of XML content. So,
one of the subclasses of node is document. And then there's element, text, CDATASection, Attr for
attribute, and many others. All of these interfaces inherit the interface node. And anything that's on the
node interface can be used by any part of the XML content. To get data out of XML, one strategy is to
walk the tree, moving down from one layer of the tree to the next.
If you wanted to do that, you'd start by getting a reference to the document's root element, and you'd do
that with a method called get document element. It returns an instance of the element interface. And
then from there, you can walk down the tree by going to the child nodes of the root. In order to get the
child nodes of the root element or any other element, call a method named, get child nodes. You'll get
back an object called, a NodeList, and you can look through the nodes in that list with a standard erase
dial for loop.
You can't use a for each loop with a NodeList, like you can with some of the classes that are members
of the Java Collections Framework like List and Array List. The NodeList class predates the
Collections Framework. And so, in order to loop through it's contents, you need to use this array style
syntax. You find out how many items there are in the node list with the get length method and then get
a reference to one of the nodes with the item method. The item method returns the object as a node, but
if you know it's for example, an element, you can down cast it to that particular interface.
Another way of getting elements in Dom is to search by name using a method called
getElementsByTagName. This method takes a string, and once again returns a node list.
GetElementsByTagName does a deep search, so if you call it from the document node, you're searching
the entire document for all elements of a particular name. Be careful with this, if an XML structure has
naming collisions, that is, elements that share a name but are at different levels of the document, this
sort of code can return a whole bunch of elements that share the same name.
But don't have the same sort of data. To retrieve text from an element there are a couple of possible
approaches. One approach is to treat the child of the element as a text node, and it will take two
statements to get the data out. First, you would call the elements getFirstChild() method. That returns a
reference to the text node. You would then cast that as a text object, and then call the objects get node
value method. That's the manual approach. A more automatic approach is to use a convenience method
called, getTextContent().

And that'll take all of the text content that's within an element. And return it as a single string. Typically
this is the preferred approach. Some things to watch out for in DOM. Include that as I mentioned, large
documents can cause Java to run out of memory. If you're working in a desktop or a server
environment, you can try increasing the available heap space with a dash Xms virtual machine
argument. For example, this argument dash Xms512m Would create heap space of 512 megabytes, and
you might be able to read the large document then without crashing the virtual machine.
If you're working on Android, on the other hand, and you need to work with larger XML content, parse
with XMLPullParser instead. It's a streaming parser. And is much more memory efficient and typically
faster than DOM or any other tree based processor. So those are some of the fundamental things to
know about DOM. Now let's take a look at the code.

Creating a DOM document


For all of the XML APIs that support reading and writing the XML documents, I'll start by showing
how to create a document and then I'll follow up by showing how to parse one. With the document
object model, you'll create a tree of objects in memory and then serialize that to a string or to a file. I'm
working in a new project called DOMCreateDocument that has a couple of starting classes. The main
class is CreateXMLWithDOM.java, and it has an empty main method, and there's also a class called
DOMCreator.
It has a private constant called XMLDATEFORMAT, but otherwise, has no code. Ill begin in the main
class, in the main method, and I'll create an instance of the DOMCreator class. Now, you could put all
of this code into a single class but Im showing how to separate the code a bit to make it more
maintainable in a production application. Ill create an instance of the DOMCreator class. I'll start with
a data type DOMCreator and I'll name the object creator, and then I'll instantiate with the classes no
arguments constructor.
Next, I'll call a method of that class that doesn't exist yet that I want to return a DOM document object.
I'll type the return data type first, document. Then I'll press Ctrl+Space. Notice that there are a couple
of different document classes available. Choose the one in the package org.w3c.dom. Name the object
doc and assign its reference by calling a method of the creator object that we'll call createXMLdoc.
Now again, that method doesn't exist yet. So I'll press Ctrl+1 on Windows or Cmd+1 on Mac for a
quick fix and I'll choose Create the Method. That takes me to the DOMCreator class and it creates the
new method. I'll get rid of the auto-generated comment, and now I'm ready to start the process of
creating a DOM document. I'll be working with three different classes. The first is called
DocumentBuilderFactory. This will take a bit of code, so I'll expand to full screen.
I'll type document and press Ctrl+Space, and I'm looking for this class. DocumentBuilderFactory in
JavaX.XML.parsers. And I'll name the object factory. To instantiate the object, call a static method of
the class called newInstance. Don't try to call a no arguments constructor method. You won't be able to
do that. Now, the factory object has a bunch of preset defaults, but you can override them by calling a
variety of methods.
I'll type factory.set. And here's a list of all of the available methods. You can control validation, name
space awareness, attributes and so on. I'm going to use the factory in its default state. So I'll get rid of
that line of code, and the next step is to use the factory to create a document builder object. I'll declare
the data type, document builder. I'll name this a builder. And I'll call the factory object, new document
builder method.
Finally, I'm ready to create the document itself, so I'll type document, and I already have the import for
that class. I'll name it doc, and I'll create it by calling the builder object's new document method. And
now I have a document object model document that I can fill in with data. Notice that there's a warning
here telling me that there's an unhandled exception. I'll deal with that by using a quick fix and adding a

throws declaration to this method. I'll save that change, and then I'll go back to my main class and I'll
see that, that potential exception is bubbling up to the main method of the main class.
When you move up to the editor for this class, the quick fix functionality in eclipse might not work
right away. If it doesn't, save your changes, then press Ctrl+1 or Cmd+1, and you should see that you
can add a throws declaration, or surround the code with a tri-catch. And as I did in DOMCreator, I'll
add a throws declaration and save again. So now, I've handled those exceptions and I'm ready to go to
the next step. All documents need a root element, and in document object model programming, they
don't have it by default.
You have to create it. To create an XML element object using document object model, call a method of
the document object called createElement. The code will look like this. I'll type in the name of the
class, Element, press Ctrl+Space, and make sure you choose the right class here. There are many
element classes available. You want the one from the package org.w3c.dom. I'll name it simply root,
and I'll get its reference by calling doc, doc create element and I'll pass in the name of the element I
want to create, customers.
This is how you'll create all elements for the XML document. It doesn't matter where the element will
go. You'll always create it by calling this method of the document object. Now the element has been
created, but it's not attached to the document yet. The next step is to tell the document where it's
supposed to go. Each node, or element, of the document, has a method called appendChild, and I'll call
that method now. I'll use doc.appendChild, and I'll pass in the root element, which I named root.
So it's two steps. Create the element, and attach it to the document where it should go. I'll return the
document in its current state. And that's all the code that I'll add to this class in this exercise. Now I'll
go back to the main class. To test this, and find out whether it worked, you might try this bit of code. I'll
use System.out.println, and I'll output doc.toString. I'll run the code, and it tells me that the document is
null, but I know that that's not true.
I created it in that class. So how do you inspect a document object model document without fully
printing it out to a string or to a file? The simplest ways is to get a reference to the document's child
object, which I said was the root element, and here's the code that I'll add to do that. I'll go back to the
main method again, and I'll create a new object that I'll data type as a node. As I described in the
previous movie, node is the super class of document, element, and all the other different node types in
XML.
I'll declare this as a node, I'll add an import for it, again making sure I choose the right one from
org.w3c.dom. I'll name it root, and I'll get its reference by calling a method of the document object
called getFirstChild. The getFirstChild method returns a node object, and it's available from all nodes
that can have child objects, such as elements or documents. Finally, I'll output the name of the node. I'll
use standard system output, and I'll output root.getNodeName.
There's a get node name method and a get node value method. Right now, I'm only interested in the
name. I'll run the code again, and I see once again that information that it's telling me it's null, when I
know it's not. But then, I see that I've successfully retrieved the name of the element that's the root

element of the document. So, if your code has gotten this far, you've successfully created your
document object, and you're ready to start adding data to it in the form of child elements that have their
own child elements, and text, and attributes.

Adding child elements with DOM


After creating an XML document, the next step is to populate it with data items. Our goal is to create
this XML structure where the root element is named customers. And then there's one customer element
for each data item. The id property is represented as an attribute of the customer and all of the other
data is in child elements. In this demonstration, I'll show you how to create the child elements named
customer and set the id attribute.
I'm working in a new version of the project called DOMChildElements This version of the project is
linked to the data provider project. And it's using the dataProvider class from that project to get some
data. Because it's asking for a small data set, it retrieves ten data items from the json file. I'll go to my
DOM creator class, and it's createXMLDoc method. This method is now receiving that list of data. A
list object containing instances of the customer class.
After I create the root element, my next step will be to create child elements of the root. And then, set
the ID attribute from the ID property of the current customer object. I'll place the cursor after the code
that creates the root element and appends it to the document. And now, I'll do a loop. This will be a
foreach loop iterating through the list of customers. I'll type foreach and press Ctrl+Space and choose
that code template. Eclipse fills it in exactly the way I want.
I'm iterating through the data list, and each time through the list, I have one customer to work with. The
first step is to create another element. I'll declare an element object that I'll name custElement. Just as I
did with the root element, I'll create the element by calling the create element method of the document
object. Notice that in this version of the code, the document object is declared outside of
createXMLdoc so that it's available everywhere in the code. So right here, I'll type doc.createElement
and I'll pass in the name of the new element, which will be customer.
Next I'll attach that element to the root element. I'll use root.appendChild, and I'll attach the new
element, custElement. Next step is to set the ID attribute. We'll get this value from the private property
ID, which will be accessed through this getter method in GetID. So going back to Dom Creator. When I
get the Id from the customer object, it will start as an integer but to pass it to the Id attribute of an
element, it has to be a string.
So Ill create a new variable called IdAsString and Ill get its value by calling Integer.toString and
passing in the customer objects getId method. Now I have a value that I can pass to an attribute and to
do that, I'll call the element objects setAttribute method. There are a number of different versions of
setattribute. Im going to use this version which accepts two strings for the name and the value. For the
name, you could pass in a literal string like this But for consistency, I'm going to use a constant of the
customer class, calling customer.ID.
And then, I'll pass in ID as string as the value. So now, my root element contains one customer element
for each data item. I'll save these changes. I'm still returning the XML document at the end of the
method. So now I'll go back to the calling scope, in the class create XML with dom. To test this, I'm

going to retrieve the child nodes of the root element. Remember, in the previous movie, I showed you
that the document has a single child element, which we retrieved as a node object.
To get the child nodes, retrieve something called a node list. I'll type in the name of the class and press
Ctrl + Space to make sure I have an import. I'll name this nodes. And I'll get its value by calling the
route nodes getChildnodes method. This will retrieve all of the child nodes of the root node. Next I'll
loop through this node list. To loop through a node list, you'll need to use a classic array style for loop.
You can't do it for each. Because the nodeLIst class doesn't implement all the methods that are needed.
It's not like the list, which is a member of the Collections framework. So when I pressed Ctrl+Space,
I'll choose for, iterate over array. I'll set the maximum value of the loop, calling the nodes object,
getLength method. That's like the size method of the list class. It returns the number of items in the
node list. Now, each time through the loop, I'll be working with a single node.
I'll declare another node that I'll call child, and I'll get it's reference by calling a method of the node list
object, that's nodes called item. And I'll pass in my counter variable, which is i. So now I have a node,
and I'm going to output the name of the node. I'll use my system output, and I'll call
child.getNodeName, just like I did with the root element. So now I'm ready to test my code. Notice
once again that I asked for a small data set, and that means I'm getting back ten data items.
I'll run the code, and the first item I output is customers and all the rest are simply customer. I'll change
this now to medium and that'll retrieve 1,000 data items. I'll run the code again And this time I get back
a whole bunch of customers. I'll return this code to small because I don't need all that data. And that
completes the next step in creating an XML document with DOM. Creating the child elements that are
members of the root element.
So far we've learned how to use the get first child method which returns one node. And the getChild
nodes method which returns a list of nodes. In the next exercise, I'll show you how to start populating
other data as child elements of the data elements.

Adding data elements and attributes with DOM


Once you've created the structure of the XML document, the next step is to start adding data in the
form of more child elements. I'm working in a new version of the project called DOMAddData, that
picks up where the last exercise left off. For each customer data item, we're creating element and
attaching it to the root element. Then we're passing in an ID value as an attribute. It turns out that these
two steps, creating an element and attaching it to a parent, are executed over and over and over again.
So to make this a little bit simpler, I'm going to refactor this code, and extract these two lines of code to
their own separate method. I'll select those two lines, right-click, choose Refactor, and then Extract
Method. I'll name the new method, addElement. And click OK. Here's the new method. To make these
variable names make a little more sense, I'll rename them. For each I'll use Eclipse's refactoring
capability.
If you haven't used it before try it from the menu first. Right-click and choose Refactor > Rename. But
after that you can use the keyboard shortcut noted here. I'll choose Rename. And I'll rename the
argument parent, because when this method is called it might be the root element, or it might be one of
the customer elements. Next, I will rename this local variable custElement, and I'll rename it as
childElement. Then finally, I'll take this literal string of customer and cut it to the clipboard.
Go up here where I'm calling the new method, and paste it in as a new, second argument. I'll see a little
error icon appear on the left. So I'll do a quick fix, and I'll change the method to add the new string
parameter. I'll name that new string parameter elementName, and then I'll pass it in to the create
element method. I restored the code back to its functionality now. I'm still looping through the list of
customers, and adding a new child element for each child data item.
Now, I'm going to make one more change to this method at element. I'm going to add a third argument,
which will also be a string, and I'll call this textValue. For most of the elements, when we create them,
we're going to want to set their child text note. And there are couple of ways of doing this. One verbose
approach, is to create an instance of a text node, and then attach it to the new element. But there's a
simpler method available in the Document Object model, and that's to call a method called set text
content.
I'll add a new line after I've created the new element named, childElement, and I'll call childElement.
.setTextContent and I'll pass in the new text value attribute. I'll go back up here to the add element call
in the customer, and I'll pass in a blank string. The customer element itself won't need a text value but
all the customer elements child elements will. So now, I'm ready to start adding data. For each
customer, I've already set the ID attribute.
But now I need to create seven child elements. I'll be getting the data from the customer object, but the
values will all need to be formatted as strings. I'll make some space within the for loop. I'll call my new
addElement method, and for the parent element, I'll pass in the customer element. Then, I'll pass in the
name of the first child element, and get that from a constant of the customer class called name. This

constant has a value of name as a lower case string.


And then I'll pass in the name of the current customer, calling the customer objects getter for the named
property, getName. When I call the addElement method, I'll be creating a new element called name. I'll
be setting it to text content to the current customers name. I'll append it to the customer element, and
return the child element, which I may or may not use. Now I'm going to duplicate that line of code six
times, and I'll go through and add all of the new elements that I need.
I've already handled the IDs and attribute, and I've handled the NAME. Now I'll add the PHONE. After
that I'll add the numeric values, AGE and BALANCE. Then the Boolean value ACTIVE. Another
string value ABOUT. And a date value JOINED. Then I'll come back here and change all the method
calls to match the names of the elements. The PHONE element gets the value from getPhone. AGE gets
its value from getAge and so on.
Make all those changes, and then we'll come back and fix up all the data typing. You'll see that errors
have appeared in four places. For the AGE and BALANCE, which are numeric, for the ACTIVE value,
which is Boolean, and JOINED, which is a date. If you try to put in a quick fix from Eclipse, you'll see
that there isn't anything that will explicitly convert these values the way you need them. So you have to
add the code yourself. I'll start with the primitive integer, the AGE value.
For primitives, use one of the helper classes and its two string method. This is an integer value, so I'll
call Integer.toString. And I'll wrap that around the value of the getAGE method. The getBalance
method returns an instance of the bit decimal class. And that class has it's own toString method. So I'll
place the cursor after the call to getBalance, and then add .toString. ACTIVE is a primitive Boolean. So
once again I'll use the helper class, calling boolean .toString.
ABOUT is a string and I'll leave that alone for the moment. But here I'm going to have to do some
extra formatting. I want to output my date using this format. This is where this final string will come
into play, named XMLDATEFORMAT. I'll use this to create a date format object, and then format the
string as I pass it into the element. So, I'll go back down to this code and make some extra space. And
I'll create an instance of the date format class from java.TXT.
I'll name it df, and I'll get it's reference from new SimpleDateFormat, and I'll pass in my XML date
format constant. Then, as I call addElement, I'll format the date value using df.format, and I'll pass in
the value that's returned from getJoined. And that's all the work I need to do right now. I've now added
six child elements to each customer element. I'll save those changes and I'll come back to my create
XML with DOM class.
And I'm going to add one more line of code. So I can see that those text values have been added to the
DOM tree. I'll place the cursor after the call to getNodeName, and I'll add a new bit of system output,
and here I'll call child.getTextContent. This will output all of the text values from all of the child
elements. It won't output the wrapping tags of the XML. We'll deal with that later. But this will simply
verify that the data is there.
I'll save my changes, and I'll run the code. And here is the result. For each customer you will see that all

of the text is output all run together. Not separated by white space or XML tags or anything else. Again,
we are simply verifying that the data is a part of the XML tree. To see this represented as true XML,
you'll need to do something called a transformation. But we'll handle that later. In this exercise, we've
seen that it takes a few different steps to add an element and set its value, and append it to the tree in
the right place.
But we've also seen that it's possible to structure the code to minimize the amount of repeated code. If
your code is working so far, you're ready for the last couple of steps. In the next movie, I'll show you
how to create text as something called a C data section. And finally we'll transform this DOM tree into
recognizable XML.

Wrapping text in CDATA sections with DOM


Most of the time, text values are represented as what XML calls text nodes. For example, the name has
a text node, phone has a text node, and so on. But sometimes, you need to wrap text inside something
called a CDATA section. The purpose of a CDATA section is to preserve literal text, even when it
contains characters that are normally reserved in XML, such as ampersands and quotes. In our XML
format, the about element uses a CDATA section.
And I'm going to show you how to create a CDATA section using document object model
programming. I'm working in a version of my project called DOM text as CDATA. And in the DOM
creator class in this project, I'm looping through the list of customer data objects and creating one
customer element for each and then populating data as child elements. In the add element method, I'm
calling the method called setTextContent. This is a shorthand method that's creating a text node and
then appending it to the element.
For CDATA sections, there is not a similar convenience method, so instead you have to do it the long
hand way. I'll show you how to do this with the about element. I'll go back up to the method that's
adding the child elements. And I'll make some space before the call to add element for the about value.
First, when I create the element, I'll initially set its text value to an empty string. That will create an
empty element. And the I can add the CDATA section to it.
First I need a reference to the element. So, remember that add element is returning that reference. So I'll
place the cursor before the call, and I'll create a new element variable that I'll call about, and I'll get it's
reference from add element. Next, I'll create a variable data-typed as CDATA section. This is another
dom note, and it's a member of org.w3c.dom. I'll name it CDATA, and I'll get its reference by calling
the method create CDATA section as a member of the document object.
And I'll pass in the value that I want to wrap inside the CDATA section. That'll come from the getAbout
method of the customer object. And finally, I'll append this CDATASection object to the element, using
about.appendChild and I'll pass in the CDATA value. And that's the change you need to make. Again, I
didn't pass in a value when I first created the element. Instead, I passed in an empty string, then I create
the CDATA section, wrapped around the value, and I append that CDATA section to the element.
I'll go back to my main class, and run the code, and I'll see that the data is still being returned all
scrunched together, and that's fine. But now that we have a complete XML structure in memory, the
next step will be to serialize it to XML, so that we'll have an XML string that we can store anywhere, or
a file that contains the XML string.

Serializing a DOM document to a string


My Document Object Model tree is complete in memory and now, I want to output it as an XML string.
I'll show you how to do this in the project DOMTransform. If I run this project, I see that the data is all
packed into the XML structure, in memory. But, now I want to output it as an XML string with tags,
quotes and so on. To do this, I'll use the TrAX API. TrAX stands for the Transformation API for XML
and it includes a bunch of classes such as transformer, transformer factory, DOM source and others.
I'm not going to need a lot of this debugging code. So I'm going to select all of the code after I've
created my document object and I'll comment it out. Then I'll add some space right here. In order to
transform my document into XML, I'll first wrap the document inside the class called DOMSource. It's
a member of the TrAX API. Be sure to add an import statement for the class. I'll name the object
source, and I'll use a constructor method from this class that's wrapped around the document object.
You can wrap a DOMSource around a document, an element, or any other XML node. Next I'll create a
writer object. You can use a variety of Java writer objects. I want to create a string, so I'll use a
StringWriter. Be sure to include an import statement for the writer class. And I'll name this writer and
I'll instantiate it with a no arguments constructor. Next, I will create an instance of a class called
StreamResult.
StreamResult is also a member of the TrAX API and needs to be imported, I will name this result and I
will instantiate it with its constructor method wrapped around the writer object. Notice that you can
wrap a StreamResult around a file, an output stream, a string, or as I'm doing here, a writer. Now, I
need to create a transformer object. To create a transformer object, start by creating something called a
TransformerFactory.
Just like a document, you start by creating the factory object, and then calling a static method called
newInstance. Once you have the factory object you can create the actual transformer, which I'll do here.
Transformer is a member of the TrAX API, too. Be sure to include the import, and name this object
transformer. Get its reference from factory.newTransformer. And now we're ready to transform to
XML. Called transformer.transform and pass in two arguments, the source and the target.
The source will be our DOMSource object, which I named source, and the target will be my stream
result. When the transform method is complete The stream results target, which is my string writer, will
contain the XML string. To get it's value, to clear a string called xmlString, and assign it from
writer.toString. And now, you have an XML string that contains all of your data. There are a couple of
errors, so let's take care of those.
The first tells us that there's an unhandled exception called transformer configuration exception. You
can either wrap that inside a tri cache or as I'm going to do here, add it to a throw's declaration for the
current method. The call to the transform method also has a potential exception. This one called
transformer exception. And I'll add that to my throws declaration as well. going to wrap my code to
make it a little bit easier to read up here and I've cleared all of my exceptions and I'm not ready to

output my XML.
I'll use system output and output xmlString, I'll run the code and there's the result. The console shows
me that I've generated very tightly compacted XML with no white space, line feeds or any other sort of
formatting. If you're creating XML as part of a web service you might want to leave it that way. This is
a small compact XML packet. But if you want to format it using a kind of formatting known sometimes
as pretty printing, you'll need to add indentation, and that takes a couple of additional lines of code.
I'll go back to my main method, and I'm going to add two lines of code after I create the transformer
object. The transformer object has a method called set output property that takes two string arguments.
The first string is the name of a property, and the second is its value. I'm going to set two properties to
format my XML. First, I'll set transformer.setoutputproperty. And, I'm going to use a class called
OutputKeys.
Be sure to import it and then use a constant of that class called INDENT. The value for this property is
boolean but it is not represented by a primitive of boolean of true or false. Instead, passing a string of
yes. Next, to control the amount of indentation on each line, we'll use a more complex property. Once
again, I'll call a transformer.setOutputProperty, and this will be a fairly odd looking string. It'll start
with a pair of braces, and within those braces a name space, that looks like this
http://xml.apache.org/xslt}indent-amount.
Java SC depends on a component that was originally created by Apache that executes these
transformations. By using the string you're going back to the original Apache library. Originally known
as Xalan. And using a property that's a part of that library. Set the value to 2 as a string. And that means
that each level of indentation will consist of two space characters. Save your change, and try running
the code again.
And now, your XML should be coming out pretty, completely formatted, and ready for saving to a file
or any other target. I'll take care of my unused imports to clean up the code, and now my code is doing
everything I need. I'm creating my Document Object Model tree. And then I'm transforming it to an
xmlString, and I'm formatting it so that it's easy on the human eye.

Serializing a DOM document to a file


Once you've created the code to output an xmlString, creating an XML file is a pretty simple set of
changes. I'm working in a version of my project called DOMToXMLFile. And it already has all the
code to create an xmlString. I'll run the code, and show that it's creating a formatted string. So I'm
going to do some refactoring on this code. I'll go back to create XML with DOM. And first I'm going to
take the code that's creating my transformer object and setting its properties.
And I'll extract that to its own method. I'll select the code starting with creating the factory. And
finishing with setting its properties. Then I'll right-click the selection, choose Refactor, and then extract
method, and I'll name this new method, getTransformer. Now, the process of creating the transformer
and doing the transformation are just two simple steps. Next, I'll take all the code that's specific to
creating a string and I'll extract that as its own separate method.
I'll start with creating the DOMSource and finish with outputting the string. And I'll once again refactor
and extract a method. And I'll call this outputToString. And I don't need all this additional debugging
code at all, so I'll get rid of it. And that simplifies my code enormously. So now, all the code that's
specific to creating a string, is in the outputToString method. And all the potentially shared code, for
creating the transformer, is in its own method.
And, I can move to the next task. Creating a method that will output an xmlString to a file. I'll place the
cursor before the get transformer method and I create a new static method. It's going to return void or
nothing. And, it will be named outputAsFile. This method will accept a document object as the first
argument, just like the string version, but then it will have a second argument. A string that we'll call
filename, and that argument will determine the location and name of the XML file that we're creating.
The process for creating an XML file is very simple. First, create a DOMSource. This is exactly the
same code when you create a string, so I'll copy it and then paste it. Don't try to reuse the same
DOMSource object. The DOMSource is intended to be used once and then discarded. Next, we'll create
a StreamResult. And once again this is very similar code to what we've already done. So, I'll copy and
I'll paste. But this time instead of wrapping StreamResult around a writer object, we'll wrap it around a
file object.
I'll do this code in line within the StreamResult constructor. And I'll create a new anonymous file
object, with new file. I'll press Ctrl+spacebar, and be sure to import the class. And then I'll pass in that
argument filename and that determines where the file would be created and what it would be named.
Now, I'm ready to create the file, just as I did when creating the string. I'll call the getTransformer
method. This time, I'll combine the steps of getting the transformer and doing the transformation into a
single statement.
With getTransformer.transform, and I'll pass in my source object, and my result object. When this code
is done. The file will have been created on disk. I have an exception I need to deal with. So, with the
cursor on that line, I'll Ctrl+1 on Windows or Cmd+1 on Mac and add a throws declaration to the

current method. There are a number of possible exceptions from this code. And I'll add them all to the
throws declaration.
I'll add some line feeds to make it a little bit easier to read. So now I'm ready to call this method, and
create an XML file. I'll go back up to my main method, and place the cursor after the call to
outputToString, and I'll call my new output as file method. Just as I did with the string I'll pass in the
document object, and then I'll pass in the location and name of the file that I want to create as a literal
string. My project already has an empty folder named output.
I'll designate that with ./output /, and then I"ll set the name of the file as customers.xml. Even if you are
working on Windows, be sure to use forward slashes. Before I run this code, I'll verify that my output
folder is empty. Then, I'll run the code, I'll see that I'm outputting my formatted string correctly then I'll
go back to the package explorer. Right-click and choose Refresh. And there's my new file, named
customers.xml.
And it contains exactly the same content as the xmlString. The TrAX API architecture is designed for
flexibility. No matter what your target output is, whether it be a file, a console, a string, or any other
output, you should be able to pass an xmlString to it using the TrAX API.

Reading an XML file with DocumentBuilder


To read an XML file with document object model programming, you'll use the same set of classes you
used to create the file. Starting off with a document builder created from a document builder factory.
But then there are other methods that you'll use. There are methods of the node, element and other
classes. I'm working in a new version of the project called DOMReadFromFile. This project has a new
package, called com.example.DOM.read. A main class, called read XML with DOM, which depends on
the customer class.
This class is in the data provider project. And that project is a part of this project's build path. Read
XML with DOM creates a list of customer objects and populates it from this method, get data from
XML. And it passes in an XML file that it's getting from the Data provider project. Right now, the
DOM reader class doesn't have much code, it has the XML date format string and it has the method.
Within the method it's creating the list and returning it, but it's not dealing with the XML file at all yet.
So here are the first steps to reading an XML file. Start by creating an instance of the Java file class. Be
sure to import it, and we'll name it XML file. And then instantiate it, and wrap it around the File name
that's being passed into the current method. Also, create an instance of the DOM document class. Be
sure to choose the right version from org.w3c.dom. And name it doc. Initially, set it to null.
Now, you create a document builder factory, and from that, a document builder. This code looks exactly
the same as when you create an XML document. Start with the factory, which we'll name factory, and
instantiate it from the document builder factory's new instance method. Then create the builder. We'll
name it builder and we'll get its reference from the factory object's new document builder method.
Now, to open the XML file, parse it and turn it into a collection of objects in memory, all you need to
do is call the builder objects parse method and pass in the file object.
So we'll say doc equals builder.parse. Notice that there are a number of different versions of the parse
method. You can parse a file, an input source or input stream, a string, or a combination of an input
stream and a string. We're using the version that opens a file and reads it into memory. I'll pass in XML
file and now the XML document has been created in memory. There are a few potential errors here.
And I'll handle these by wrapping these codes in a tri-catch block.
I'll select the three lines of code, right click and choose Surround With, try multi-catch block. This code
only work in Java seven. If you're working in Android, you should choose a standard tri-catch block,
which would create multiple catch clauses. I'll get rid of that comment, and now we'll move down to
the next step. Once the document has been parsed and stored in memory, you can traverse it forward
and back, and search it in a number of different ways. Here's one approach that you can use.
As I showed when I created an XML document in a previous movie, the first child of the document is
the root element. So, we could create an Element object and name it root and you can get a reference to
it by calling doc.getDocumentElement. This ensures that youre getting the root element and just in
case the document has other child objects, such as XML processing instructions or comments, get

DocumentElement will skip pass those and go directly to the element you are interested in.
Then, to make sure you're getting the right object, you could output the node name of the root element.
I'll save that change and go back to my main class, and run the code, and I see that the root element is
displayed correctly. Now, from there, you'll need to know a little bit about your XML structure. So I'll
go back to the data provider and I'll open the file that we're reading with the document builder. The root
element is customers and the child elements are customer. You could walk down from the root element
and get the child elements, but that turns into quite a bit of code sometimes.
So, a simpler approach is to use a method called Get Elements By Tag Name. Get Elements By Tag
Name does a deep search of the XML tree, and it'll find all elements that have a particular name and
return them as a node list. If you followed this approach you don't need a reference to the document
element. You'll be able to search for the data elements you want without walking down to the document
element. So I'll comment out that code and instead, I'll create a node list.
I'll name it list and I'll get its reference by calling doc.getElementsByTagName. And then, I'll pass in
the name of the data elements for this XML file. Customer, as a literal string. Then, I'll output the
number of nodes that were found, using list.getlength. I'll save that change, and run my main class
again, and I am told that I've found 1,000 nodes. That is, 1,000 elements that have the customer node
name.
So now, we're ready to do the next step, which is to create data objects that match the XML elements.
I'll use a standard for loop, iterating over an array, and I'll set the maximum value of the loop using
list.getlength. For each element that's found I'll create an instance of the Customer class. Then I'll add
that customer object to the data object. That's the list of customers that was created at the beginning of
this method. I'll save the change, and once again I'll run the code, and now I see that I've found 1,000
elements.
And there are 1,000 customer objects in the array list. So the next step once you've to accomplished
this is to start filling in those data objects and I'll show you how to do that in the next movie.

Getting data from XML with DOM


So far I've described how to get an XML document into memory as a document object. And then how
to use the method getElementsByTagName to retrieve a list of nodes and deal with them one at a time.
Now I'll show you how to get the data from these nodes using a variety of methods. I'm working in a
version of the project called DOMgetData And so far, I am succeeding in reading the XML file and
creating one plain old Java object.
An instance of the customer class for each customer element in the XML file. I'll do the rest of the
work in this class, dom reader. Within this for loop, I'm creating an instance of the customer class and
adding it to my array list. And then at the end of the method I'm returning that list. So all the other code
will go here after each data object has been created. First, I'll describe how to deal with attributes.
They're in many ways the simplest.
Our customers.xml file has customer elements, each of which has an ID attribute. When you retrieve
the value from that attribute, it will always start off as a string. If you want it to be some other data
type, you have to convert, or cast it yourself. And to get the attribute value, you can use a method of the
element class called GetAttribute. First, I'll create an element, and I'll name it custElement. And I'll get
its reference by calling list.item, and I'll pass in i, my counter variable.
The item method returns the object as a node, that's the superclass of element. So I have to add casting
to cast it as an element. Now that I have the custElement reference, I can get the attribute value. I'll
create a string called idAsString and I'll get it by calling custElement.getAttribute. GetAttribute expects
a string and I'll pass in a constant of the customer class, customer.id. If you prefer, you can pass in a
literal string.
Then I'll convert that string to an integer and I'll pass it into my customer object. I'll do this in one
statement. I'll call customer.setid which expects an integer or a number. And I'll pass an
interger.parseInt and then idAsString. So now I've succeeded in passing the ID from an XML attribute
to my plain old Java object. Next I'll deal with a child element in the XML file. Each customer element
has a name, phone, and so on.
The steps to retrieve a child element and its content are fairly complex in DOM. I'll show you all the
steps here. First I'll create a new element that I'll just call node. And I'll get that reference by calling my
customer element object getElementsByTagName method again. And I'll pass in the string that's the
name of the element I'm looking for. Now, the problem here is that getElementsByTagName always
returns a node list. It assumes there will be more than one.
Even if there's only one element of a particular name. You'll still get back a node list. So if you the
programmer know that there's only element in this scope that is this customer element that has a
particular name. You can safely say now retrieve the first item in the node list by calling the item
method and passing in a value of zero. And then finally, cast the result as an element. This seems like a
lot of code and it is.

And you'll find when you parse XML with other APIs, such as JDOM, that this can be enormusly
simplified. But document object model is following a programming model that's dictated by the W3C.
And this is how you do it. So now I have a node, and I want to get the value of either the text node or a
C data section that might be a child of the element. Fortunately, there is a convenience method for this
task called getTextContent, and I'll get its value by calling getTextContent.
And then finally, I'll pass that value to the name of the customer. Using customer.setName and I'll pass
in content. Now again, this is a lot of code just to get a text value. And you will find you have to do
these steps every time for every element. So this bit of code is a good candidate to put into its own
separate method. I'll select those two lines of code, right-click, re-factor and extract the method, and I'll
name it getTextFromElement.
Then, I'll modify this method by adding a new argument. It'll be a string, and it'll be called element
name, and here I'll take out customer.name and replace it with that argument. And now this method is
reusable. Then I'll come back up here to where I'm calling the method and when I call
getTextFromElement. I'll pass in the customerElement and the name of the child element I'm looking
for. Now to test all of that I'll go back to read XML with Dom and I'll add a little bit of debugging code.
I'll add a 4-inch loop and for each customer in the data list, I'll output the value of the customer object
itself. You might remember from previous exercises that the customer class has a two-string method
which outputs the ID, the name, and the joined date if it's is there. I'll save and run that code. And see
that it won't work quite yet, until I have a good date value. And so I'll come back to Dom Reader again,
and this time I'll parse the joined value.
I'll create a string called joined. I'll pass in the customer element, and the name of the joined element.
Then I'll parse that value, and this is where the XML data format constant will be used. I'll get rid of
this suppress warning annotation, because now I'm going to use this constant. I'll come back down here.
I'll create a DateFormat object named df. I'll instantiate it using new, SimpleDateFormat, and pass in
the formatting string.
Then I'll call customer.setJoined and I'll pass in df.parse, and I'll pass in the joined string. This parse
method can throw an exception, so I'll use a quick fix and surround that code with a try catch. I'll save,
and come back to my main class and run it And there's the result. I'm now successfully retrieving the
ID, the name and the date. So, here's a challenge for you finish this code by parsing all of the other
child elements of the customer element.
Remember they aren't all strings, the age is an integer. Balance is a big decimal, active is boolean, and
about is text, but it's a CDATA section instead of a text node. Follow the same coding patterns that I've
shown so far, and when you're done, look at the next movie and I'll show you the solution.

Handling XML namespaces and prefixes with DOM


As I described in the chapter on parsing XML with this simple API for XML or there are a lot of XML
files you might need to deal with that use name spaces and prefixes. In my data provider projects, data
folder, I have a file called ns customers.xml. This file has the same data structure as the other XML file,
but it has a namespace declaration, with the namespace string here, and an equivalent prefix of cust. I'll
show you how to parse this kind of file with DOM.
I'll copy that namespace to the clipboard. Then, I'll go to this project, DOMNamespaces, and I'll open
the class DOMReader. Up at the top of the class, I'll create a new field, private, static, final, and string,
that I'll name NSURI, for namespace URI, and I'll paste in that string. Now, as I promised in the
previous movie, this class has the completed code to parse all of the XML.
In addition to the name and joined date values, I have code in place now to parse the other values,
retrieving the name, age, balance about an active values, and converting text to appropriate data types
where necessary. There's a single method called getTextFromElement that's being used to retrieve all of
these values. So, now, how do you deal with name spaces? First, I'll show you what happens if you
don't deal with them. I'll go to my ReadXMLWithDOM class that's a part of this project and I'll change
the name of the file that I want to parse to ns customers.xml, and when I try to parse the file, I get
nothing. The problem is that I'm looking for elements with names like customers, name, phone and so
on, but without turning on namespace awareness, the parser gets names like cust:customers.
It doesn't know how to break those values apart. So, the first step in working with name spaces is to
turn on name space awareness. You do this with the factory object before you create the document
builder. I'll go to DOM reader, and I'll make a line after I created my document builder factory, and I'll
class factory.set namespace aware, and pass in a value of true. So far so good. Now, each time I look
for an element of a particular name, instead of using the method, get elements by tag name, I'll use a
different version of this method, called get elements by tag name NS for name space.
This method takes two arguments instead of one. The first argument is the name space URI, and I'll
pass in that constant I just created, NSURI, and the second value is the name of the element. I'll make
the change here where I'm retrieving of all the customer elements, then I'll go down to my reusable
method, getTextFromElement, and I'll make the same change here. Instead of using
getElementsByTagName, I'll use getElementsByTagNameNS.
And then, just as I did before, I'll pass in the namespace URI as the first argument of the method. I'll
save that change. I'll go back to my main class, and run the code again. And now I'm once again
successfully retrieving all of the data. So now we've seen how to parse XML files, whether or not they
include name spaces and prefixes. And I'll show you one more thing that's important to be aware of
when you're working with DOM. When you parse an XML file with document object model, you're
reading the entire data set into memory all at the same time.
For a reasonably sized data set this works fine. It's not as fast as streaming, but it'll work. To judge this

I'm going to use a class called the Stopwatch. This is a class that's a member of the linked data provider
project. I'll instantiate it using new Stopwatch, and then I'll call this start method, and I'll pass in a label
of parsing XML. Then, I'll place the cursor after the call to my reader class, where I'm getting the data
from the XML file, and, I'll call watch.stop.
When I call the stop method, I'll get some debug output telling me how long the operation took. For
this process, I won't need to output all the customer data, so I'll comment that out, and I'll run the code,
and I'll see that parsing this XML file took about a half a second on my computer. But now let's see
what happens when we try to parse a large data set. I'm going to try to parse this file,
NSCustomersLarge.xml, it's also a part of the data provider project.
I'll change the name of the file, I'll save, and I'll try to run. Now depending on your computer's
resources, you'll see one of two things. Either the parsing will complete, and it'll tell you how long it
took, or you'll run out of heap memory. It depends on your system resources, how much memory you
have, and how your Java Runtime is setup. If you run out of heap memory, you can adjust that in the
compiler settings. But either way you'll see that working with DOM and large data sets can take some
time.
This operation on my computer took almost three seconds. That might not sound like a lot, but in a web
environment or a mobile environment that can be far too long. For large data sets, I strongly
recommend streaming instead of DOM style programming for parsing data. But the great advantage of
document object model style programming in parsing XML is that the DOM API is available
everywhere you have Java. It's a part of Android. It's a part of the standard JDK, and it should be
available pretty much anywhere you can program with the Java programming language.

Searching a DOM object tree with XPath


After reading an XML file into memory, the next most common task is to search the data set, and one
of the tools that you can use is X Path, an expression based language that can be used for filtering XML
data sets. The full power of X Path is beyond the scope of this course, but I'll show you the code you
need to use an X Path expression and extract data from a DOM object tree. I'm working in a version of
the project called DOM search X Path and in its DOM reader class, it's retrieving all of the data from
an XML file.
It's using get elements by tag name to say, give me all the elements named customer. I'm going to
replace that little bit of code with a little bit of filtering, so that we can only retrieve customer elements
that match a certain criteria. First, I'm going to refactor this code a bit. I'll go back to read XML with
DOM, and I'm going to change the way that I call this method, getDataFromXML, so that I'm passing
in two arguments. The first value will still be the file name, customers.xml, from the data provider
project, and the second will be an X Path expression.
I'm going to use the following X Path expression. Double slash in X Path means find an element
anywhere in the object tree. Then, I'll put in the name of the element I'm looking for. Then, within a
pair of brackets, I'll add a criteria of age greater than equal to 65. Now again, this is just one sliver of
what's possible in X Path. You can apply all sorts of filtering expressions, but this will do for our
purposes. I'll save that change, and then go to the DOM reader class and I'll re-factor this method, get
data from XML, so that it's expecting an X Path expression as a second argument.
The second argument will be a string and I'll call it filter. Now the next step is to add some new objects.
I'll come down here, and I'm going to get rid of some of this commented code. I don't need that
anymore, and I'm also going to comment out that code that's retrieving all of the customer elements,
and I'll replace it with the code you need to execute an X Path filter. I'll start by creating an instance of
a class called X Path factory, which is a member of the package javax.xml.xpath.
I'll name the object xFactory, and I'll get its reference by calling a static method of the X Path factory
class called new instance. Now using the factory, I'll create an instance of a class called X Path. This is
the object that will actually do the filtering. I'll name this object xpath, all lower case, and get its
reference using the factory object's newXPath method. So I've created the factory, and the factory has
created the X Path object.
Next, I need to create something called an X Path expression. The X Path expression class represents a
compiled version of the XPath expression, starting off as a string and turning into memory that can be
used to do the filtering. I'll this object exp, and I'll get its reference by calling the X Path objects
compile method and passing in the string-based filter. And, now I'm ready to execute the operation. Just
as I did, when I called get elements by tag name, I'll create a node list which I'll call list, and I get its
reference by calling exp.evaluate.
There are a few versions of the evaluate method. I'll be using the version that has two arguments, an

object and something called a For the object, I'll pass in my XML document And then the second
argument is the return type. You represent the return data type using a constant that's a member of the
class X Path constants, and specifically, I'll say that I'm returning a node set. Make sure you choose
node set here, and not node.
If you pass a node, you'll get back a single node object containing a lot of stuff you don't need. You're
looking for a node set, which translates in Java to node list. Now when I complete this line of code, I
have a potential error, and I'll fix it with a quick fix adding cache to the node list class. I'll save those
changes, and see that I have some potential exceptions, so I'll go to the first line that has a potential
exception and I'll do a quick fix and add a throws declaration to the current method.
I'm throwing a class called X Path expression exception. I'll save that change here, and then I'll go back
to my main class, read XML with DOM, and because this exception can bubble up through the call
stack, I'll need to add the same throws declaration here. And now my code is complete. Before I made
these changes, when I ran this code, I was getting back 1000 records, that is 1000 instances of the
customer class. So now let's see how many of our customers are 65 and over, and I get back 17 records,
and their details are listed here.
Once you have this code in place, you can experiment with different X Path expressions. X Path is a
very powerful expression language. It's made possible in document object model programming by the
fact that DOM based documents live in computer memory, unlike the streaming APIs such as SAX, the
DOM API brings all your data into memory all at the same time. And then, you can execute these sorts
of expressions, to pull out just the subset of data that you are interested in.

Part 4: Creating and Parsing XML


with JDOM
44m
How JDOM works
The JDOM API was designed as an improvement over Document Object Model programming. JDOM
is an independently managed open-source project and you can download the library and the
documentation from www.jdom.org. JDM includes tools both for readings and for writing XML
content. Some of the most important characteristics of JDOM include that it's a tree-based XML
processor. Just like DOM, it stores all of its data in memory at the same time, and this has pros and
cons.
When JDOM reads XML content, it depends on an underlying SAX parser. So, it's assumed you're
running in a Java development environment that has SAX but in the current versions of the Java
developer kit and the Android run time SAX is always available. In order to use JDOM, you need to get
and add the JAR library for JDOM into your Java application. This means that you're increasing the
size of your application, and on Android, that might be a concern.
But you'll see that the JDOM JAR file is pretty small. And also important to know, is that if you want to
use JDOM on Android, you'll need JDOM version 2.01 or later. If you need a tree-based API. And you
can decide JDOM or DOM, here's somethings to consider. First JDOM is designed to use concrete
superclasses instead of interfaces. This makes the programming model a little bit simpler. Also it has a
lot of convenience methods, in XML programming you tend to do the same tasks over and over again.
Finding a set of objects. Getting the text values from those objects. And JDOM tries to provide
convenience methods to make these common tasks much easier and take less code. Some of the
common issues with JDOM include the same things you run into with DOM or any tree-based API.
Large documents can challenge available resources. You can run out of heap space if you try to read a
very large document into memory. Also JDOM like other tree-based APIs, is typically slower than a
streaming processor.
But it has the benefit of letting you search the entire data set and memory all at the same time. Here is a
simple example of using JDOM to parse XML. The first step would be to read the document into
memory. Just like with DOM in this example I'm starting off with a file object that's pointing to the file
containing my XML. Then I create an instance of a JDOM class called SAXBuilder. And then a JDOM
document. This is a different document class than the DOM version, and when you enter your code,

you need to make absolutely sure that you're referring to the right class from the right package.
Then finally, you call the builder object's build method. Just like DOM's parse method. This can accept
a variety of sources. Files, input streams and so on. Once you have the data in memory, you can then
traverse the XML tree, just like you do in DOM. In DOM, the primary superclass for all XML nodes
was called Node. In JDOM, it's called Content. And all of these other class extend the Content class,
including Document, Element, Comment, Text, and CDATA which is really a subclass of Text.
To walk the XML Tree you might start by calling the getRootElement method. This is similar to the
getDocument element in DOM. It gives you a reference to the element that contains all of the other
elements of the tree. To get references to the children of an element, you can call a method named
getChildren. In DOM, you would call get child nodes which returns a node list and then as you loop
through the node list, you have to cast its objects as an element. But in JDOM there's an assumption
that the children of elements are also elements.
And so you get back a Java list which contains instances of the JDOM element class. And because the
list is a part of the Java collections framework, you can use the 4 each loop style to loop through and
handle each child element one at a time. To get text data from an element, you can call a method named
getChildText, that handles a bunch of tasks for you. Let's say you have an element named customer,
which has a child element called name.
In Dom, you'd have to walk down the tree to get the name element, then get its text value. In JDom,
you do it all in a single statement with getChildText. And it returns the string value of the text note or
CDATA section that's a member of that child element. Again, the goal is to reduce the amount of code.
When using JDOM, you typically have to watch out for the same things you do with DOM. The large
document problem, which for all tree-based APIs, can cause Java to run out of heap space.
And, you have the same solution available as with DOM, increasing heap space with the Xms virtual
machine argument. Specifically with JDOM, because it's not build built into the JDK, you have to add a
JAR file to your application. For many environments, that's not an issue, but if you're building say, an
app for Android, you have to decide whether the increased app size is worth the easier programming
that JDOM gives you. So those are some of the important things to know about JDom and now as we
have with previous APIs let's take a look at the code.

Creating an XML document with JDOM


To get started with JDOM, download the library from this website, jdom.org. From the home page,
click the binaries link under downloads. And then download the most recent stable build. I'll be using
version 2.0.5 in these exercises. But you should be able to use this, or any later version. Download and
extract the file to your hard disc. I've extracted it to my desktop. When you open the resulting folder,
you'll find a bunch of JAR files.
The only one you need is this one Jdom-2.0.5, copy that jar file to the clipboard. Then go to Eclipse and
import the project JDOM create document. This is a beginning project that doesn't have any JDOM
code yet. Go to the libs file and paste the jar file into place, then right click the jar file and select build
path. Add to build path. If you aren't using Eclipse, follow the process in your IDE to add the Jdom jar
file to the Java build path.
Now, all of the classes in JDOM are available to your project. The next step is to add a little bit of
JDOM code. I'll go to the class Jdom creator. That's a member of this project. The only bit of code it
has right now, is a constant that isn't currently used. I'm going to create a public method, that will return
an instance of a Document class. If you've followed through the chapter on Document Object Model,
you'll remember that there was a Document class that was used in DOM.
This is a different class. It's the JDOM Document. So, when you import it by pressing Ctrl+space. Be
sure to choose the version of document that's a member of the package or .jdom2 and not org.w3c.dom.
Notice from the icons that this is a concrete class and not an interface. I'll name the new method, create
XML document, and it's going to accept a list of customer objects. I'll need to import both the customer
class, which is coming from the data provider project that's a part of the build path of this project, and
I'll also need to import the list interface from Java.util.
I'll name the list that's being passed into the method data. And then, within the method, I'll create a new
instance of the document class. I'll type document, and once again, make sure that I'm choosing the one
from org.jdom2, and I'll name it doc. And to instantiate it, I'll use the no arguments constructor method
for this class. New document. Notice how much simpler this is than in document object model, in
DOM you've created a factory, and then a builder, and then the document, in JDOM you' just create the
document.
Similarly, to create an element you instantiate a concrete class. I'll type element and press Ctrl+space
and I'll choose the element class from org.jdom2 and I'll name it root. Then, I'll instantiate it with the
constructor method from the element class and I'll pass in the name of the element, which will be
customers. Finally, I'll attach that element to the document. This step is similar to DOM programming
but instead of using a method called add child, I'll use a method called add content.
I'll call the document object. Then add content and I'll pass in the root element. The add content method
expect an argument data type as content. Content is the super class of Element, and many other JDOM
classes. So you can pass in an element, a text node, and many other types of content. Next, I'll add

Customer elements to the root. I'll do a for each loop, and for each Customer in the data list, I'll create a
new element.
I'll name it customer and I'll add it to the root. I'll declare the element object and name it custelement.
Once again, I'll instantiate it with the constructor method that expects a string. The name of the element
and I'll name it customer and then I'll attach that to the root. Calling root.add content and I'll pass in the
customer element. So now I have a document that has a root element named customers and child
elements named customer, one for each item in the list.
And finally, I'll return the document from this method. Notice that compared to DOM programming,
JDOM simply requires less code. As I mentioned before, there were no factories or builders. Just
concrete classes. And that there's a lot less error handling that's needed. The goal is simplicity. Now, I'll
test this code. I go back to my main class, create XML with JDOM. I'll create an instance of the class I
was just working on, which is named JDOM creator, I'll name it creator and I'll instantiate it with its no
arguments constructor, next I'll call the method that I was just working on.
I'll create an instance of the document class, again, making sure that I'm choosing the version from
org.jdom2, I'll name it doc. And I'll cal creator.create XML document, and pass in my data object. And
that's the list of customers. Next, I'll test this by creating a list of elements. I'll fill in the list of elements
by calling the document object's getRootElement method. That returns the root element that I created,
and then from there I'll call getChildren.
And notice that JDOM assumes that the children of an element are elements. It doesn't worry about the
whitespace or other content that DOM makes you deal with. And then I'll find out how many elements
were created by using some system output, and outputting the size of the list. I started off with ten data
items, because I passed in a value of small to the data provider's get data method. So I should get ten
child elements back. I'll run the code, and that's exactly what I get.
If I change this to DataProvider.MEDIUM and run it again, this time I get back a document with a
thousand child elements. So that's the beginning of creating an XML document with JDOM. Again, it's
a lot less code than with DOM, and it's easier to maintain in the long run. The next step will be to fill in
the data in those child elements and you'll see that JDOM provides a lot of convenience methods to
make those tasks a lot easier as well.

Adding data to an XML document with JDOM


Once, you've created a JDOM document and given it a root element. The next step is to start give it the
data. In this version of the project. JDOMAddData. My jdom creator class has code to create the root
element and attach it to the document and then attach customer elements to the root. Next we'll need to
add child elements. I'll go back to the example of the XML file that we're trying to create. It has the
root element of customers, a child element of customer with an ID attribute, and child elements for
everything else.
Everything is represented as text nodes, except for the aboutChild element, which has a c data section.
So going back to JDOMCreator. First, I'm going to create a method that knows how to create a new
child element and attach it to the customer element and set its text value. I'll place this new method
under the createXMLDocument method. I'll make it private and void And I'll name it addChildElement.
It'll take three arguments.
The first will be a jdom element that I'll name parent, the second is a string that I'll call element name,
and the third is a string called text value. The first step is to create the new element. I'll call it child.
And instantiated with the element constructor, and I'll pass in the element name argument. Then I'll set
the text value of the element. In dome, you can create a text note and attach it as content, or you can
call a convenience method.
And Jdome does the same thing. The simplest way of adding a text note is to call the element's setText
method, and then passing in the text value. And then finally, you'll need to add the child element to the
parent, calling parent.addContent and passing in the child object. So now we have a very simple way of
adding all the elements. And I'll go back up here to the for loop. Within the for loop, I'll add a couple of
the child elements.
I'll call the addChildElement that I just created and I'll pass in custElement as the parent element. The
first child element of all customers is the name. So for the element name I'll use a constant of the
Customer class called NAME, that has a value of name as a lower case string. And then for the value,
I'll pass in customer.getName. And that's the name of the current customer data object. Now I'll
duplicate this line of code a few times. And as I did in DOM, I'll go through and change the names of
the child elements.
The next one will be phone, then age, then about, balance, active, and I need one more, and the last one
will be join. Then, I'll pass in the data. The phone value is a string, so I don't need to do anything
special there. Just pass in the value from the get phone method. The 8 is an integer, I'll change the
method I'm calling and then I'll wrap that value in integer.2string. The about value is a simple string.
I haven't done anything to turn it into a CDATA section yet, but I'll come back to that in a moment. For
the balance, I'm working with a big decimal object. And I can call it's toString by appending that call,
after I call getBalance. The active property is a primitive Boolean. So I'll wrap that in the boolean
class's toString method, and then just as with DOM, dates require special handling. I'll create a

DateFormat object that I'll call df, and I'll instantiate with new SimpleDateFormat, and I'll pass in my
XML date format constant that's declared at the top of the code.
When I pass in a value, I'll use the getJoined method, and I'll wrap that in the DateFormat object's
format method. And now the format of the date will be determined by this constant at the top. I'll get
rid of that suppress warnings annotation, that's not needed anymore. And now all of my child elements
are in place. Finally, I'll add the attribute. I'll do this right here, before I add all the child elements. This
code looks exactly the same as in DOM. I'l use custElement.setAttribute, and I'll pass in two strings.
The first will be the name of the attribute, which I'll get from the customer.ID constant. And then, I'll
pass in customer.get id and because that's being returned as an integer, I'll turn it into a string by
wrapping in an integer.to string. And now my XML document should be complete. I'll save those
changes and come back to my main class and I'll test it by adding a very simple set of code. That will
turn this into an actual xml string.
You'll see that this is a lot simpler than in document object model programming. It requires three
stages. First, I'll create an instance of a class called xml outputter. This is a member of org dot j dom
two dot output, and I'll call it outputter. I'll instantiate it initially with a no arguments constructor call.
Next, I'll turn my xml document into a string. I'll create a string that I'll call xml string And I'll get its
value by calling the outputter object output string method.
There are a number of versions of this method, taking different types of XML objects. I'll use the one
that looks for a document object and pass in my doc variable. And then finally I'll output that to the
console with system output. Remember how much code it took to do this in dom using the tracks API
and the transformer factory. and transformer classes. In JDOM 2, that's all done for you. I'll test the
code by clicking the run button, and I'll see that I get XML, but it's not well-formatted.
Here's how you format it. I'll go the XML outputter constructor call and I'll pass in a value that's a
member of a class called format. This is a part of the same package, org.jdom2.output, and the value
will come from this method, gtPrettyFormat. There's also a getCompactFormat, and one other similar
method. I'll save and run, and there's the result. Nicely formatted XML. I've changed the amount of
data that I'm retrieving from the DataProvider, using the SMALL constant, that means ten, and run the
code again, and I'll see that I get a complete XML packet, starting off with the XML declaration at the
top, and containing the root element The customer-child elements with the ID attribute, and all of the
child elements.
The last step in creating the XML document is to use a C data section for the about element, that I'll
show you how to do that in another movie.

Wrapping text in CDATA sections with JDOM


A C data section is used to wrap text, that can contain reserved characters, that might otherwise be
turned into what are known as Entities, in XML. These include Ampersands, quotes, and some other
characters. Here's how you can wrap text inside a C data section, using JDOM. The coding style is
pretty much the same, and I'll demonstrate it with the about child element, of my customer data set.
Right now, I'm creating an about child element and passing in the raw text value, and that's creating a
text node.
And if that text contains these reserved characters, it might cause some issues. So, when I first create
the element, I'll pass in a blank string instead. I'm also going to change the code, so I can get a
reference to the element, that's just been created. I'll come down to addChildElement and I'll change the
return type, from void, to element. And at the end of the code, I'll return the ChildElement. Now, I'll
come back to the code, where I'm creating the aboutChildElement, and I'll get a reference to the
element that's just been created, that I'll name, About.
Next, I'll create an instance of a class, simply called CDATA, which is the member of the package,
org.jdom2. Again, be sure you're using the right version of the class. In dom, this is called
CDATASection and it's an interface, and in jdom2, it's just called CDATA and it's a class. I'll name the
object cdata, in lower case and I'll instantiate it, using the class' constructor method, and i'll pass in the
value I want to wrap.
Ill get that in this case, from customer.getAbout. Then finally, I'll attach that CDATA section to the
element, using the add content method. Just like element, cdata is a sub-class of content, and that's it.
Now, my about element will have its text wrapped, in a CDATA section. I'll run the code to test it and
I'll see, that for each customer element, the about element contains a CDATA section, and, the CDATA
section contains the text.
So, JDom makes it very easy to create text nodes with a single method, but when you want to use
CDATA sections, you need to break up the code a little bit, creating an intermediate object, the CDATA
section, and adding it to the parent element.

Outputting an XML file with JDOM


I've previously described how to output an XML document in JDOM to a string, using the class XML
outputter. The XML outputter class has a structure method. And you can pass in a format object to
determine the format of the XML string. And of course, once you have a string, you can output that to
anywhere. But if your goal is to create an XML file say on disk, you can use a shortcut. I'm working in
a project called JDOMTtoXMLFile.
And I'll add just a couple more lines of code. I'm in the main method of the create XML with JDOM
class. I've already created the XML outputter object and used it once to create the XML string. And I
don't need to recreate it. I can use that object again to create a file. The outputter object has a method
called output. Just like the output string method, you can pass in a number of different types of XML
objects: documents, elements, C data sections and so on.
But then the output method has a second argument, and you can pass in an output stream or a writer
object. I'll use the Java file writer class. I'll type in the name of the class, and add the import, and name
the object writer. And I'll instantiate it with its constructor method, and I'll pass into that an instance of
the file class. And I'll pass into that the location and name of the file I want to create, as a literal string.
I'll place the new file in an output folder under the current project.
Using ./output/ and then name of the file I want to create: customers.xml. When I create this code, I'm
told that there's a potential exception. So I'll use a quick fix and add a throws declaration, for
IOException. And now, i'm ready to write the file to disk. I'll once again use the outputter object, and
call the output method. Notice all the combinations of the arguments. Again, the first argument can be
any XML type: document, comment, C data, element and so on.
And the second argument can either be an output screen or a writer. I'll pass in my document object,
named doc, and my writer object that I just created. Before I run this code, I'll verify that the project's
output folder doesn't have anything in it. I'll click on the folder and press F5 to refresh, and I verified
that it's empty. Now I'll run the code And this still outputs the XML string to the console. But when I go
back to the Package Explorer view and press F5 again to refresh.
I see that the file is now there, and I can double-click to open it and see that it's a well formed XML
file. So again, JDOM's goal is to simplify the code. What took five or six lines of code in DOM and the
tracks API, just takes two or three lines of code in JDOM.

Parsing an XML file with JDOM


The JDOM library includes a set of classes that make it very easy to parse and extract data from XML
files and XML strings. I'm working in a project named JDOMReadFromFile. It has a new package
ending with read, which has two classes. ReadXML with JDOM is my main class, and it creates an
instance of a class called JDOM Reader in the same package. And then calls a method named
getDataFromXML. And it passes in the name and location of an XML file. This XML file is a part of
the data provider project that's linked to this project. I'll jump to this method getDataFromXML. Which
receives the name of the XML file on disk, it creates a list of customer objects and then returns that
empty list, our goal is to add code that opens and an pauses and XML file and extracts the data, and
then converts it to a list of plain old Java objects. I'll start here, after the declaration of the list, and I'll
begin by creating an instance of the Java file class, from java.io. I'll name it file, and I'll instantiate it
using its constructor method, and I'll pass in the file name argument that was passed into this method.
Next, I'll create an instance of the class from a JDOM library named Sax Building. It's the part of the
package or jdom.2.input. In order to read an xml file or streaming to memory,JDOM uses Sax,the
simple API for XML that I've described in an earlier chapter of this course. It uses the sax architecture
to the get the data. But then like the document object model, it creates a tree of objects in memory. So
just like DOM, in order to use JDOM to read an xml file, your system must have enough resources,
memory and so on to store the entire data set in memory all at the same time. I'll name this object
builder, and instantiate it using the classes no arguments constructor. Next, I'll declare an instant of the
JDOM document class, as always, make sure you're importing the right class. This is document from
org dot JDOM2, and I'll name it doc, and I'll initially set it to null. Next, on a separate line, I'll create
the document. I'll create the reference to the document by saying doc equals, and then I'll call a build
method of the sax builder object. There are many versions of this build method. You can build from a
file, as I'm doing. But you can also build from an input source. An input stream, a reader and so on. I'm
choosing the first version of the method, that's expecting a file, and I'll pass in my file variable which
references the actual file because that value has been passed in from the colon class. When you call the
build method, there are a couple of possible exceptions. You can handle them by adding a throws
declaration to the current method, or you can wrap this code in a tri catch block. I'll use a quick fix, and
surround with tri multicatch. If we're working in Android, this Java 7 syntax won't work, and you
should instead use a tri block with multiple catch blocks. If I get into the catch block, after printing the
stack trays, I'll return null. But, if I get past the try-catch block, then I'll be able to assume that the
document has been correctly parsed, and a tree of XML objects has been stored in memory. To prove
that this works, I'm going to go get some code from an earlier class that I created. I'll go to the create
package of the current project and open the file create XML with JDOM.Java. I'll copy these three lines
of code which are using an XML out putter object, to output the XML document as a string to the
console. I'll copy that. Then I'll come back to my JDOM reader class, and I'll paste that code into place.
I'll save the changes, and I'll go back to my main class, read XML with JDOM, and run the code. And
there's the result, the XML file has been read into memory, it's stored as it tree of object, and then the

XML outputter is being used to output it as an XML string again. So now that you know how to parse
the file, and turn it into a tree of objects in memory, the next steps would be to extract the data from the
XML tree. And I'll get into those details in the next movie.

Getting data from XML with JDOM


Once you've parsed an XML file and turned it into a JDOM document, the next step is to extract its
data just as with document object model, you'll need to deal with both attributes and elements but in
JDOM, it takes less code. Im working in a version of the project called JDOMGetData. And the code
is already in place to parse the XML file. I won't need this debugging code that's outputting the XML
document so I'll comment it out. And I'll add code here after the document has been built. The first step
is to get a list of the elements. In our XML file each data element has a name of custumer and it's a
child of the root element. So to get that data, I'll start by getting a reference to the root Element. I'll
create a variable datatyped as a JDOM element. And I'll name it root. Then I'll get its reference by
calling the document object's getRootElement method. Next, I'll walk down one level from the root
element and get all of the child elements. To do this, call a method a the element class called
getchildren(). You'll get back a list. This is a standard java list and each item in the list will be an
Element object. I'll name the list custElements and I'll get its value by calling the root object
getchildren method. There are a few versions of this. There's a no arguments version that will return all
child elements regardless of name. One that lets you filter on name and one that filters on name and
name space. I'll use this one that filters on the name and I'll look for all elements that have a name of
customer. Now that I have the list, I can iterate through it and I'll use a for each loop. On each time
through the loop, I'll be dealing with one customer element, so I'll name this element Object CE, for
customer element. First, I'll create a plain old Java object, data typed as customer this class is a member
of the data provider project that's linked to this project. I'll name it customer and I'll instantiate it with
it's no arguments instructor. Then I'll add it to the data object that's declared up here at the top of the
method. This is a list of customer objects, set up as an array list. So I'll take that customer object and
say, data.add(customer). Next, I need to start filling in the customer object data, I'll start with the ID
property. I'll get that from the ID attribute of the customer element. There are a couple of ways to do
this and I'll show you both approaches. One approach is to create an integer value that I'll call ID and
get it's value by calling c e. That's the customer element. Dot get attribute value. Notice that get
attribute returns an attribute object. While get attribute value returns a string. I'll say that I want to get
the attribute that has a name matching the i d constant of the customer class. Get attribute value always
returns a string, so then you have to convert that value to the required data type which would be an
integer. So I'll wrap that expression in integer.parsint. So that's one approach. The second approach is to
call the get attribute method and return an attribute object. I'll declare an instance of the attribute class,
from org dot JDOM2, I'll call it ATT and I'll get it by calling CE dot get attribute and once again, I'll
pass in the name of the attribute, is in customer dot ID. Now I'll convert this to an integer, using a
method of the attribute class. I'll call my customer object setter for the ID property that set ID, which
expects an integer. And, then I'll pass in ATT that's the attribute and I'll use one of these castingmethods. There's a get int value, get double value, float, long and so on, I'll choose this one get Int
value. So this approach is a little bit cleaner, it still requires a couple of statements but you can clearly
see where the conversion is happening. These conversion statements in JDOM 2 can throw exceptions,

so whenever you call them, you should either add a trove declaration to the current method or wrap the
code in a tri-catch I'll use a quick fix on this line of code and I'll add a throws declaration and I'll be
adding a throws Class Data Conversion Exception which is the part of the JDOM library. I'll get rid of
that earlier line of code because I only need to get this value once and now I need to start getting data
from elements. We're starting off with the customer element, which has child elements. Each child
element has a name such as name, phone, about and so on. And then from that element we have to walk
down to the text value. Fortunately, JDOM makes this incredibly easy with a method called Get Child
Text. I'll call my customer object setter method for the name attribute. Set name and then I'll pass in
CE, that's the customer element, .getchildtext and I'll pass in the name of the child element I'm looking
for using a constant from the customer class of customer.name and that's it. Get Child Text does
everything I need, walking down the element tree, getting the text node, and returning the text. Now
I'm going to duplicate this line of code five times. I'll change the center methods that I'm calling as
follows. I'll call this setter for phone, about, age, balance and I need one more and this one will be for
active. Then I'll change the constants for the element names I'm looking for, to match the setter
methods. The name, phone and about lines are complete. GetChildText can retrieve either a text note or
a cdata section, it looks invisible when you're retrieving the text. To get other data types, it will have to
explicitly cast them. For the age value, I'll call Interger.parseInt and I wrap that around the age value
that we turned from getchildoftext. For the balance value, I have to convert that into a big decimal. So,
I'll call the classes constructor method that can work with a string with new big decimal. Be sure to
import the big decimal class and then finally, for the boolean data type, wrap that in
boolean.parseBoolean. The last value is join, which is a date and the process for that is exactly the same
as in document object model. Use the XML date format constant that's at the top of the code and then
create a DateFormat object. It'll look like this DateFormat, be sure to import these classes. I'll name that
df and I'll instantiate it with new SimpleDateFormat, wrapped around my constant. Then I'll call the
customer object's setJoined method, which expects a date. And I'll pass into that df.parse and I'll pass
into that ce.getChildText and I'll pass into that the name of the element, customer.join. The date format
objects parse method can throw an exception, so I'll add a quick fix and add something to the throws
declaration of the method the parse exception class. And, so now I've extracted all of the data from the
XML file and created a plain old Java object so that I can use anywhere else in my application. I'll
come back to my main class and I'll add the required froze declarations to this main method. For
exception objects can bubble up to get data from XML and then I'll add some Debugging output. I'll
use a four each loop and I'll output the two string representation of each customer object. And here's my
test, I'll run the code and I retrieve 1000 rows of data and I represent it in the console. One good way to
decide whether you want to work with DOM or JDOM is to compare the two bits of code. You'll find
the DOM in general takes a lot more code to get the same work done and that JDOM makes your
coding more efficient and logical. The only downside to JDOM, is that you must include the JDOM
library in your Java applications build path. That will be true in a server environment in an Android
App or any other Java environment but the coding efficiency might be worth it.

Searching a JDOM document with XPath


Just like the Document Object Model, JDOM is a tree-based API, that stores all the data in memory at
the same time. And this lets you to reverse the tree back, and forth and search the data set. JDOM 2 has
support for XPath. You'll need to combine JDOM with another library. By default, JDOM pairs with a
library called jaxen. You can get jaxen from this website at jaxen.codehaus.org/releases.html. As
described here, jaxen is an open source implementation of the XPath specification. And it's the default
implementation of XPath that JDOM 2 looks for, so to get started with XPath in JDOM, download the
binaries in zip format. I've already done that and I've extracted the Zip file to my desktop. I'll take this
jar file, jaxen-1.1.6. And I'll copy it to the clipboard, then I'll go to Eclipse where I've opened a version
of my project called JDOMSearchXPath. I'll go to my Libs folder and past the file into place, then I'll
add the jar file to my Build Path. By right clicking and choosing Build Path. Add to Build Path. And
now, I can execute XPath searches. Next, I'll go to my main class. Read XML with JDOM. And I'll
make some changes here. I'm going to change the way I call the get data from XML method, and pass
in an additional string. This will be an XPath expression. The double slash means, search the entire
XML document, then I'll type customer, the name of the element I'm looking for. Then in a pair of
brackets, a predicate or filtering expression, of age greater than or equal to 65. Now the way I'm calling
this method no longer matches the method's signature in my JDOMReader class. So first, I'll make sure
that I've added a comma between the two arguments. And then I'll use a quick fix to change the
method, get data from the XML string, to accept a second string argument. I'll name this new string
Argument Filter. Now add some line feeds here to make the code more readable. Now I'm ready to add
the code, to filter my data. This code goes after you've parsed the XML file, and built the document in
memory. And it's going to replace these two lines of code. That are getting the document's root
Element, and then getting all of the Child elements named customer. I'll comment those two lines of
code out, and I'll add three new lines of code to replace them. First, I'll create an instance of a class
called XPathFactory. This class is a member of the JDOM library, and it's in the package
org.jdon2.xpath. And I'll call it xfactory. You can get the factory object in a couple of ways. One
approach, is to call a method called New Instance, and pass in a particular factory class. This lets you
chose which implementation of XPath you want to use. But if you want to use the default
implementation, and that's the jaxen library, that I've already added to my projects Build Path. Just call
the method named instance. So now I have my factory object. The next step, is to create an instance of
a class called XPath Expression. This is also member of JDOM-2 library. And it accepts a generic
notation. Type the class element. And I'll name this object exp, for expression. Then, I'll use the Factory
object, and I'm going to compile the string that's being passed into this method, and turn it into a
compiled XPath expression, using this code, xfactory.compile. And you will pass in two arguments, the
filter string, and then a method that's a member of a class named Filters. Also part of the JDOM 2
library, so be sure to import it, and the name of the method you're passing in is Element. And now,
you're ready to execute the filter. Create a list of elements. And this is exactly the same data type as was
created when you called the getChildren method earlier. So you can name it exactly the same thing,

custElements, and then pass in exp.evaluate and pass in the document object. And now, the list of
elements will no longer be all of the data. But only those customer elements that match your XPath
expression. I'll save the change, and I'll come back to my main class and save there, and I'll run the
code. And this time instead of getting back all 1,000 records, I only get back the 17 customers where
their ages are 65 or over. This code is structured so that you can experiment with other XPath
expressions, simply by changing the string. As I described in the chapter on DOM, the full power of
XPath and all of the things you can do with it, are beyond the scope of this course. But if you know
how to implement in XPath expression using JDOM and jaxen. You can significantly reduce the
amount of code you have to write to retrieve subsets from an XML data file.

Part 5: Creating and Parsing XML


with StAX
48m
How StAX works
The next API in this course is StAX or the streaming API for xml. StAX was originally created by BEA
and it became a part of the Java API for xml processing in Java 6 and is now bundled in the jdk that's
delivered by oracle. The version that's included with Oracle's JDK is only one of multiple
implementations that are available though. As is indicated by it's name, StAX is a streaming API that,
like Sax, the simple API for XML, depends on an event based architecture.
But unlike Sax, StAX can both read, and write XML content. If you're an Android developer it's
important to know that there is not an implementation of StAX for Android. And if you want to Parse
XML using a similar coding style, you should look at the XML PullParser which is included in
Android. Here are some important characteristics of StAX. As I've described, it's a streaming processor.
The available implementations include the version that's incorporated into Oracle's distribution. That's
based on something called the Sun Java System XML Parser or SJSXP. And that's included with Java
SE 6 and above, or you can get one of the more popular third party implementations named Woodstox
available from woodstox.codehaus.org. All of the demonstrations that I'll do in this course will use
diversion thats included in the Oracle JDK. Here's an example of using StAX. There are a couple of
different ways of reading XML with StAX. You can either use a very low level streaming parser with a
class called XML stream reader, or you can use a slightly higher level event-based parser with XML
event reader. They're both event based in a way. But XMLEventReader gives you more robust classes
to work with. XMLStreamReader is a tiny bit faster, XMLEventReader is a little bit easier to program
with. But they're both much faster than the equivalent tasks with a tree-based processor such as DOM
or JDOM. Just as with parsing XML, there are two approaches to creating XML. There's the
XMLStreamWriter, and the XMLEventWriter. I'm not going to cover the XMLEventWriter in detail in
this course. I'll show you code for the XMLStreamWriter which takes significantly less code. But then
for parsing XML, I'll show you both versions. The stream and the event reader. Here's an example of
parsing XML with StAX. This version uses the XML event reader class. First you would create the
reader object, an instance of the XML event reader class. And then you'd create objects to store your
data. Here's a bit of code, in this example, I'm using an input stream wrap around the file. Then I am
creating a factory object and from that the XML Event reader object. Then I create a list of customer
objects. In this example, customer would be a plain old Java object. A pojo class that represents a single
instance of my data and I instantiate that as an array list and I declare a customer object that I can use to

store data about a single instance of the class. Then, to handle the data, you create your own even loop.
In this example I'm using a while loop. I'm calling a method of the reader object called hasNext which
returns a Boolean value that indicates whether there's anymore XML to read and then, if there is, I call
method called next.Event to advance to the next available node. The XMLEvent class has sub classes
named StartElement and element characters and so on and then you write some additional code to find
out what kind of element you are working with. In this example, I find out that I'm in the StartElement
and then I examine the name of the element. Using the expression getName.getLocalPart. And if I find
out that I'm on a customer element, I create a new instance of the customer class and add it to my list.
And there would be more code here that you would use to extract attribute values and data values. To
get text from an element in stacks. One approach, is to go to the next event of an element to get the
element's text node. And then you would get the data from the text node with a method called as
Characters. Here's an example. This code says if I'm on the name element, then jump to the next event.
That would go to the child text node of the element, then call the as characters method that returns a
characters object and call its get data method to return a string and then store that value in the customer
object that you previously created using this code model, once the loop is complete and the entire xml
content has been read into memory. You'll end up with the list of customer objects, and then you can
process the data in any way you want. Things to know about stacks include first of all that its very, very
fast. In my experiments, I've found that the StAX API ranks among the fastest Java APIs for both
creating and reading XML files. And because it's a streaming API, it handles large documents very
well. Because it's able to discard information from memory as it advances from node to node of the
XML content. Once again, however, there is no StAX implementation for Android, and if you tried to
import a StAX library into Android, you'd run into problems, because all of the StAX classes are
members of packages that start with JavaX, and Android doesn't let you do that easily. So that's some
basic information about the StAX API. As we've done with all of the API's let's take a look at some
code.

Exporting data with XMLStreamWriter


So far I've created the beginning of an XML file. A well-formed xmlString that I created with the XML
StreamWriter class. I'm working in a project now called StAXAddData that has that beginning code.
And now I'm ready to take this list of customer objects named Data. And serialize it as XML wrapped
inside this XML structure. I'll place my cursor here between the right start element and the right end
element method calls.
And I'll make some new space. I'm going to loop through the list of customers. As I've done before
with a Java list, I'll use a for each loop. And for each customer within the loop, I'll create an XML
element that represents that customer data. Because this will take a good bit of code I'm going to create
a separate method for it. The name of the method will be createCustElement, and it'll take these
arguments. First the XML StreamWriter which I've named writer, and then the customer object.
I'll type in the call to the method, which hasn't been created yet, then I'll use a quick fix and I'll create
the method. That's created down at the bottom of the class. The first thing I'll need to do, is to create the
start element for the customer. So I will take this writer object and once again I'll call write start
element and I will pass in the name of the element which will be customer. Just as with all the other
methods that are using the XML StreamWriter, I'll need to add a throws declaration.
So I will move the cursor to that line and press Ctrl+1 or Cmd+1 on Mac and add the throws
declaration to this method. After you write the start element but before you start creating child
elements, you need to then add any attributes. In my XML structure the idea of the customer is
represented as an attribute of the customer element. So this is where you would write it. Call a method
called writeatt. There are a number of versions of this method, and we'll use the one with a local name
and a value.
Both strings. For the local name, I'll pass in ID, which is represented by the constant ID that's a
member of the customer class. And then for the value I'll pass in customer.getId. That returns an
integer, and I need it to be a string. So I'll wrap it in integer.tostring. This follows the same sort of
pattern that I've used with previous APIs such as DOM and JDOM. Next, I'll create a child element. I'll
use the writeStartElement method again.
And this time, I'll create the name element, which is a child of customer. Notice that I haven't ended the
customer element yet. You simply nest the code one element within the next. I'll get the name of this
element from the constant Customer.NAME. Next, you'll need to write some text. In stacks, a text
value is known as a character's value. And you write it using a method called writech. It looks like this.
Writer.writech, and you can either pass in a string or a character array with a start and a length value.
I'll use the version where I'm passing in a string value, and I'll get the value from customer.getName,
and then end that element. Called writer.EndElement. Now, these three lines of code will be repeated

over and over again, once for each child element of the customer element. So, I'm going to take those
three lines and refactor them, extracting them to their own method. I'll select them and right-click,
choose Refactor > Extract Method, and I'll name this data method rightDataElement.
When you first create the method, it will once again receive an XML StreaWriter and the customer. But
now to make this method reusable, add a third argument to the method. We'll call it element name and
then we'll use that value when we create the start element. And then instead of passing in the entire
customer object, we'll change this argument to a string known as value. And that's what we'll use when
we call writeCharacters. Now, I'll come back up here, where I'm calling the new method, and I'll pass in
customer.getName and then the name of the element which will be Customer.Name.
Before you finish the customer element, be sure to end that element. Place the cursor after that code
where you created the child element. And call writer.writeEndElement. Now this code isn't complete
yet. But let's test it so far. Let's track through what's happening. At lines 24 and 25 in my code, I'm
starting the document. And starting the root element. Then, I'm looping through the list of customers
and creating one customer element for each customer.
In this method, I'm creating the start element for customer, writing one attribute and one child element
and then ending the customer element. And in the code to write the data element, also known as the
child element. I'm starting an element, I'm writing the characters value, and I'm ending the element. So
now I can go back to my main class and run the code and now I get a well formed XML string that
contains the IDs and the names of each and every customer. So now your job is to complete this code.
Following the same model set out here. Here are the rules. Each of the child elements of the customer
element must have a data value. Most of them will be written using the writeCharacters method. For
the one value that should be wrapped in a CDATA section, use a method called writeCDATA instead.
One approach, is to take the writeGetElement method and make it conditional. Pass in another
argument. A boolean value indicating whether you want the value to be written as characters or as
CDATA. And at the end of the process, you should have a well formed xmlString. It still won't have
any identation or pretty printing. But, I'll show you how to accomplish that later on, but it should be a
well formed xmlString that can be written out to a file or shared over the web in other ways.

Creating an XML string with XMLStreamWriter


The streaming API for XML has two ways of creating XML documents with the Stream Creator and
the Event Creator. I'm going to focus on the Stream Creator, because it involves less code and is a little
bit faster. I'm working in a project called StAXCreateDocument. The main class has a main method that
gets some data from my data provider class, that's a part of the data provider project that's linked to this
project, and returns the data as a list of customer objects.
The customer class is also a part of the data provider project. It then creates an instance of my custom
class that I've called StAXStreamCreator. And then, calls it createDocument method. It passes in the
data list, and a string representing the name of a file that we want to create. The createDocument
method isn't doing anything yet. It simply receives the data and the file name. And it has a constant
named XML date format which is exactly the same as the constants that I have used in previous
chapters.
So here are the first steps to follow in creating a document using StAX. First, create an instance of a
class called XMLOutputFactory. Be sure to import the class. You'll see that it's a member of a package
named javax.xml.stream. This class and it's related classes are all bundled in the Oracle JDK. I'll give
this object a name of factory, and I'll get it's reference by calling a static method of the class,
XMLOutputFactory.newInstance.
Now before I show you how to output to a file, I'll first show you how to create an XML string. I'll
create a StringWriter object, be sure to import it and I'll name it sw. And instantiated with the no
arguments constructor. Next, I'll create an instance of a class called XMLStreamWriter. I'll name it
writer and I'll get its reference by calling a method of the factory object named
createXMLStreamWriter. You'll see that there are a number of versions of this method.
You can pass in an output stream, a result, or a writer. I'm going to use the version that accept a writer
argument, and I'll pass in sw. So now, as I call the methods of the XMLStreamWriter, the result would
be written back to this StringWriter, and I'll be creating an XML string. Here are the steps to create the
XML document itself. The XMLStreamWriter has a set of methods called writeStartDocument,
writeStartElement, writeEndElement, writeEndDocument, and so on.
I'll start by creating the document. I'll call writer.writeStartDocument. There are a few different
versions of the method. I'll use the version with no arguments to get a default XML document. Next, I'll
create the root element. I'll call a method called writeStartElement. And I'll pass in the local name,
that's the name of the element. That will be customers. At this point I would start writing the data
elements, the elements in this case named customer. But I'm going to skip that part for this step and I'm
going to finish the document. I'll call a method called writer.writeEndElement, and then one called
writeEndDocument. Before you finish the process, be sure to close the writer. I'll call the writer object's
flush method and then, its close method, and the results will now be written to the string. Finally, I'll
use System output and I'll output sw.toString. Now I'll save that change and see that I have a bunch of

potential exceptions. And I'll handle those by adding a throws declaration to the method. I move the
cursor up to the first line that has the exception warning. I'll press Ctrl+1, or Cmd+1 on Mac, and add a
throws declaration. And I'm throwing an instance of XMLStreamException. And that clears all of the
potential exceptions. All of the methods of XMLStreamWriter can potentially throw this same
exception. Now, I'll come back to my main class, and I'll add a throws declaration here as well. I'll save
and that clears all of the warnings, and now I'll see what happens. When I run the code, I get a well
formed XML string. It has the XML declaration, and then the root element and nothing else. So if
you've gotten this far, you've successfully used the XMLStreamWriter class. You've created a well
formed XML document, and now you're ready to pack it with data. And I'll show you how to do that in
the next movie.

Formatting documents with StAX utility classes


In the previous movie, I described how to fill the XML document with data, using StAX. And I showed
you the beginning of the solution but not the end. In this project, StAXFormatXML. I have the
completed code, in the class StAX stream creator. In addition to the ID, attribute and the customer
NAME, I am now writing out the PHONE, ABOUT, AGE, BALANCE, ACTIVE, and JOINED values.
All taking into account the various properties data types in my customer POJO class.
Right now, I'm writing out XML without formatting. So it's all compacted together. And just as with
DOM or JDOM, if you're trying to create XML to send over the web, you might want to keep the XML
compacted like this. But in many cases, you'll want to format it, adding indentation, creating an XML
style known sometimes as pretty printing. There's a class available named indenting XML Stream
Writer, and it's found in some StAX implementations, but not all.
And specifically it's not included in the Oracle JDK, version 7. So if you want to format your XML
using the simplest possible approach, you'll need to go get a new JAR file. You can find this JAR file
from this website at java.net/projects/staxstatsutils/downloads. To download the most recent stable
version, as of the time of this recording, that was staxutils20070216.
So you can see that this is a very stable library, that has not been undergoing recent updates. I have
already downloaded the ZIP file to my desktop and extracted it. And I'll find this JAR file, stax-utils
and I'll copy that JAR file to the clipboard, then I'll go back to Eclipse to the package explorer and I'll
paste that JAR file into the Lives folder. Then, I'll add it to the project's build path, by right-clicking on
it and choosing Build Path > Add to Build Path. If you're working in some other IDE other than
Eclipse, follow that IDE's process for adding the JAR file to your build path. Next, I'll go to the class
StAX Stream Creator. And this is where I'm writing out my XML. Currently, I'm writing it out to a
string. Using the XML stream writer, and Java's string writer class. Now, because I have that JAR file
in my build path, I can create an instance of the class IndentingXMLStreamWriter, which is a member
of the package Javanet.staxsutils. For the moment, I'm going to name this object writer, which will
collide with the XMLStreamWriter above. But then, the goal is to create the
IndentingXMLStreamWriter and wrap it around the XMLStreamWriter. So, I'm just going to rename
this one as w. Don't use Eclipse's refactoring, you don't want to change all the references. And then,
come back down here to where you're creating the IndentingXMLStreamWriter, and instantiate it with
the class's constructor method. Starting with new, and then the class, IndentingXMLStreamWriter and
wrap it around the XMLStreamWriter. And that's it. The IndentingXMLStreamWriter class has a series
of set methods, that you can use to control indentation levels, how new lines are written, and other
features. But we'll use it with its default behavior. I'll save my changes. I'll come back to my main class
and run. And there's the result. The XML is output, and it's pretty printed with indentation. Now again
the IndentingXMLStreamWriter class is included in some StAX distributions. Just not in the default
distribution that's included with the Oracle JDK. But it's incredibly simple to use, and if you need
formatted XML, it's the easiest way to get the job done.

Outputting an XML file with XMLStreamWriter


Our StAX code is now successfully writing out an XML string that is well formatted and the next step
is to write to a file instead of to a string. I am working in a project called StAXToXMLFile and it has
all the code to create the XML document, packet with data and format the output. In order to create the
xmlString, I'm using the StringWriter class and passing that into createXMLStreamWriter. To write a
file instead, use a FileWriter class.
So I'm going to comment out this line of code that's outputting the StringWriter. And I'll replace it with
creating a FileWriter. This is the standard Java FileWriter class. I'll call it fw for FileWriter, and I'll
instantiate it, wrapping the constructor method around a new instance of the file class, and I'll pass into
that, the file name argument that was passed into this method. So now the FileWriter knows how to
create the file, and write to it.
Then, I'll come down here, where I'm creating the XMLStreamWriter, and instead of writing to the
StringWriter, I'll write to the FileWriter. I have one little bit of code clean up. Down here, I'm expecting
a StringWriter, which no longer exists. So, instead, I'll simply output a string of All done. I have one
issue on this line, where I might be throwing an exception. So, I'll place the cursor on that line, and use
a quick fix, and add to the method's throws declaration.
The file class can throw an I/O exception. So I'll add that to this method. Save that change. Then I'll
need to add to the throws method of my main method as well. Following the same process and now my
code is clean without any erros. I have one warning in my problems view, I'll double-click to deal with
that. And get rid of this unused import statement, and now I'm ready to test. I'll save all of my changes,
and before I run the code, I'll make sure that my output folder is empty. Then, I'll click the button.
I get the message All done. I'll go to the Package explorer and refresh, and there's the new file.
Customers.xml. Now remember that one of the goals of using a streaming API is performance. And, the
StaX API is definitely faster than say, Document Object Model, particularly when working with large
data sets. So let's do a little bit of performance testing. I'll go back to my main class,
CreateXMLWithStaX, and I'll place the cursor after the code that's retrieving data from my JSON file,
which is part of the data provider project.
I don't care right now how long that takes. What I care about, is how long it takes the StringWriter to do
its job. So I'll create an instance of my Stopwatch class, which is a part of the data provider project, and
I'll name it watch, and I'll instantiate it, and call it start method. And I'll pass in a label of Create XML
with StaX. Then, after the document has been created, I'll call watch.stop. Notice that Im only
retrieving 10 rows of data.
I'll run the code, and it tells me that that took 14 milliseconds, but lets see what happens with a larger
data set. My customers.xml file is a very small file, but what happens if we try to deal with 50,000 rows
of data? In my call to the data providers getData method, instead of asking for a SMALL data set, I'll
ask for a LARGE data set. And I'll change the name of the file that I'm creating from customers.xml, to

customers large.xml, and I'll run the code. It takes a little bit longer. But notice that even with 50,000
rows of data, it only took about a second to accomplish. Now as always, performance can differ greatly
depending on available resources, disk speed, and so on. You need to do your own performance testing
and see what is best for your application with your particular data structure. But all things being equal,
typically you'll see that creating XML files particularly larger XML files is much faster with StaX than
it is with DOM or JDOM.

Parsing an XML file with XMLStreamReader


Once you've created code, to open a file and loop through it using StremingEvents, the next step is to
add code to extract data from your XML and store that data as native Java objects. I'm working in a
version of my project called StAXGetDataWithStream. I'm going to comment out the code that I used
for Debugging. That's the code that Output the name of each event as a string. I won't need that in my
final production code and then I'll add code here, within the wow loop.
The first step within the wow loop is already in place. Calling the next method to go to the next event,
and getting the Return value as an integer that I've named event type. And now I need to find out which
event type I'm working with. To read a well formed XML file like this one, you only really need to deal
with one event type, Start Element and I'll show you how to get all the other data you need without
having to deal with the characters, end element, or other events. I'll add a conditional block, an if
statement and I'll set the condition as follows.
If eventType and then I'll compare that integer value to a constant of an interface named XMLEvent.
This interfaces a member of JavaX.xml.stream.events and must be imported. The interface has a bunch
of constants, each representing one of the possible events. There are constants for attribute, c data,
characters and so on. But again, the only event type I'm interested in is Start Element. Now, I need to
find out which element I'm on.
And I'll find out by calling a method of the XML stream reader called Get Name. I'll create a string
variable that I'll call event Name. And I'll get its value by calling reader.getname. The getname method
returns an object called a queue name. But to get the equivalent string value, call toString from that
object. So now, I know which element I'm working with. The first element I'm interested in is the
customer element. In my XML file each customer element has an ID attribute and a bunch of child
elements.
Because I'm working in Java 7, I can use a switch case statement to evaluate the element named string.
The key for the switch statement will be my new variable elementName. And then the value for each
case will be one of the element names I'm looking for. The first will be simply named customer and I'll
use a literal string for this one. When I hit the customer element, I'll take two critical actions. First, I'll
create an instance of the customer class.
I'm going to declare that instance up here, outside of the while loop, so that I can address the object
from anywhere in this method. I'll declare it initially as null, but then I'll come down here to the case
statement for customer and I'll instantiate it with the no arguments constructor. Then I'll add that object
to my list with data.add. The other critical action when you're dealing with an element that has
attributes is to grab those attribute values.
Each of my customer elements has an attribute named ID which represents an integer value in my

customer class. So I'll come back to my code and I'm going to do this all in a single line. I'll call the
centre method for the ID property of the customer class that's set ID. When you retrieve a value from
an attribute it'll come to you as a string. But I need the value to be an integer, so I'll use
Integer.parseInt. And then finally I'll get the attribute value.
I'll call a method of my XML stream reader object that's reader. And the name of the attribute is get
attribute value. Notice that there are a few different ways of dealing with attributes, but attributes in
XML aren't supposed to be in a particular order. And many of these expect you to know the position of
the attribute in the element. For absolute safety, I recommend using the Get Attribute Value method
which accepts two arguments a namespace URI and a LocalName.
If you're dealing with an XML file that has name spaces, you can pass that value in here. But if like this
XML file there aren't any name spaces, just pass in a blank string from the first argument. Then, pass in
the name of the attribute and I will get that from the constant Customer ID and thats all the code you
need to retrieve the attribute, convert it to an Integer and save it in your data object. Next, you'll deal
with the child elements of the customer data element.
In this XML file, each customer has a name, phone, about and so on. To get each value and save it in
your data object, first add a case statement, I'll create the case statement for the name. Each time you
create a new case statement, be sure to add the break statement at the end, and then place your cursor
between the case and the break. To get either a text node or a see data section that's a child of an
element you can call a simple method called get element text.
I know that I have a customer object, because I can't get to a name before I've got into a customer
element. So within this case statement I'll call customer.setName and then I'll call the reader objects
getElementText method and that's it. The value that's a part of the element is retrieved and it's saved
into my data object. I'll code up one more of these and this will be for the joined value. I'll duplicate
this case block and then for the new version, I'll change the name of the element I'm looking for to
joined and I'll change the setter method that I'm calling to setJoined.
Just as in earlier examples of this course, when you retrieve a date value, you need to parse it to turn it
into a Java date. So, I'll follow the same sort of coding model I've used in the past. I'll create an
instance of the dateFormat class that I'll name df. And I'll instantiate it using new SimpleDateFormat
and I'll pass in my constant XMLDATEFORMAT that's declared at the top of the code. And then to
pass a date object into setJoined, I'll call the date format object parse method and wrap it around
getElementText.
As before when I use date formats, you'll need to do with the possible exception, I'll use a quick fix and
surround this code with a tricatch. And now, I'll go back up to the top of the code and get rid of this
suppress warnings annotation, that was on XML date format and I'll clean up my imports and make
sure I only have the ones I want. I'll save all of my changes, I'll come back to my main class and run it
and there's the result. I'm successfully parsing the data retrieving the i d, the name and the date that the
customer joined.
I'll leave it up to you to fill in the rest of this code for each additional child element of the customer

element add a case statement for that child element. And where necessary, convert the value to the
appropriate data type, an Integer, a big decimal, or a Boolean. You'll find that CDATA sections work
exactly, the same as Text nodes. You can call the getElementText method of the XML stream reader
and it'll return a string, either way. But you'll need to add your own explicit conversion code, for the
numeric and Boolean data types.

Getting data from XML with XMLStreamReader


The streaming API for XML has two architectures for parsing the XML, the Stream Reader and the
Events Reader. I'll start with the Stream Reader, which uses a single loop pulling data from the XML
file. In this project, StaX Read With Stream, I have a beginning main class named, Read XML With
Stax Stream. In its main method, I'm creating an instance of my StAX stream reader class. This class
has a get data from XML method, which receives a name of a file and returns a list of customer objects.
This is the same customer class that I've been using in other parts of this course. And it's a part of the
data provider project that's connected to this project. To get started, I'll create a few objects. I'm going
to create an instance of the standard Java input stream interface. And then I'll use two classes that are a
part of StAX, named XML Input Factory and XML Stream Reader. I'll start with the InputStream. I'll
declare an instance of this interface, and I'll name it in.
And I'll instantiate it by creating the concrete class, FileInputStream. And I'll pass into that objects
constructor method, a new file object, which will be wrapped around the name of the file. So now I
have a way of opening the file. Next, I'll create a factory object. This is an instance of a class called
XMLInputFactory, and it's a member of the package javax.xml.stream. I'll name it factory, and I'll
instantiate it with a static method of the class named newInstance.
As with all factories, you now have an option of adding certain features, but I'll use a default factory
object. And I'll use it to create something called an XMLStream reader. Once again this is a member of
the stream package. I'll name this reader, and I'll get it's reference by calling factory.createzxml stream
reader and you'll see that you can wrap this around in the input stream, a reader, or a source object. I'll
use the version that's expecting an input stream and I'll pass in the in object that I already created.
Now you're ready to read the file. This is a pull parser. Which means that you are in charge of saying
when you want to get more data. In order to move forward through the XML file, you'll call a method
of the reader object named next, and that will go to the next significant part of the XML file. Examples
of significant parts of XML are the start of the document, the start of an end of an element, some
characters and comments or other nodes.
When you call the next method, you'll get back an integer, which is the event type. So I'm going to
declare an integer argument that's I'll name, eventType and I'll get it's value by calling reader.next.
Now, to find out what the event type is, you'll need to translate that integer into a string. The core StAX
API doesn't have any code that'll do that for you, but the StAX utilities library that I added to my
project in an earlier demo does have a useful class that will help you out.
You can use a class called StreamUtils that has a static method called getEvent TypeName. I'll do some
System output, and I'll pass in XMLStreamUtils, and I'll call the method getEventTypeName, and I'll
pass in the event type. And that'll tell me which event I just got to. I have a few error indicators, so I'll
deal with all those by adding a couple of throws declarations. First, I'll deal with the input stream, and
I'll add a throws declaration, and that's for the file not found exception.

Then I'll go to the next error and use a quick fix, and add another throws declaration And this is for
XML stream exception. I'll save the changes and that clears all the errors from this file. I'll go back to
my main class, and my main method already has the throws declarations that I needed. So I'm ready to
test. I'll run the code and I see that I get the first event. And the name of the event, which is start
element. So now let's see what happens when you loop through the entire file.
I'll go back to my StAX stream reader class, and I'll place the cursor after the output of the event name.
And I'll create a while loop. I'm going to use a code template that's a part of eclipse called while iterate
with iterator. It'll get some of the code right, but not all. The condition of the while loop will be based
on a call to a method called reader.hasNext. This follows the iterator pattern. It has a next method that
moves forward through the iterator.
And a hasNext method that tells you whether more content is available. Within the loop, I'll reuse the
event type, so I don't need to re-declare its data type. And I'll get its value by once again calling
reader.next. Then I'll take this bit of code that's outputting the event type as a string. I'll duplicate it and
move this version down a few lines so it's inside the loop. And now I'll be reporting every event that I
hit. And when the hasNext method returns a false value, I'll jump out of the loop and return whatever
data I've collected.
I'll save and go back to my main class and run again. And now as I loop through the document, I see
that I'm getting a variety of events, start elements, end elements, characters events and end document at
the very end. So once you have this code in place you know that you are able to read the XML file. And
the next step is to add code to collect the data from the XML file, and store it in native Java objects.
And I'll get into those steps in the next movie.

Parsing an XML file with XMLEventReader


As I previously described, STAX allows you to use either a streaming or an event based programming
model to parse XML. The styles are so similar to each other though, that to show you the event model
I'll start with finished code that's using the class xml stream reader. And I'll show you how to change it
to use xml event reader. I'm working in my project called stAXEventReader and I'll open the class
called StAXEventReader. This has code that is using xml stream reader.
The first step in converting is to change the name of the interface you're using. I'll change from xml
stream reader on line 32, to xml event reader. Just like xml stream reader, this has to be imported. It's in
the package javaX,xml,stream. And to get the right object, you'll need to change the factory method
that you call. Instead of create xml stream reader. Call create XML event reader.
Next, I won't need these two lines of code. They're starting the reading process, and with the event
reader, you can simply ask whether something is available, so I'll get rid of those, and work just within
the while loop. Next, I'll change how I advance through the XML file. With a stream reader, you called
a method named next, and it returned an integer indicating the next event type. With the event reader,
call a method of the reader object called nextEvent.
It will return an instance of XMLEvent. The same interface that had constants that I used in streaming.
But now, I'll get a full event object. I'll name it event and I'll call reader.nextEvent. And that moves me
first to the start document, then to the first start element, and so on. Next, just like I did with streaming,
I need to evaluate which event I'm on. I'm not getting any integer event type anymore. Now I'm getting
an event object.
And I can find out if I'm on the right event by changing the condition of the if statement to event.isstart
element. The XML event object has one is method for each of the possible event types. And I'll choose
isStartElement. And now my logic is the same as it was with streaming. The next step is to get the
name of the element and this code looks a little bit different. I'll comment out the streaming version,
and I'll replace it with the following.
The XML event interface is inherited by a number of other interfaces. There's one interface for each of
the events. There's one named StartElement, EndElement, Characters, and so on. I'll create an instance
of StartElement that I'll name se. And I'll get its reference by calling a member of the event object
called asStartElement. The event object can be cast as a start element, a character's event, or an end
element event. Then to get the name of the element, I'll once again create a string called elementName
and I'll get its value by calling the method of the Start Element interface.
It's called get name and from there I'll call a method called get local part and this expression will work
fine with XML files that don't have name spaces. I have one more change to make, and that's in how I
get the attribute value. The XML stream reader had a method called get attribute value. But when
working with the Event Reader, there's a few more steps. I'll start here, after I've created the Customer
Object. I'll create an instance of a class called qName, that's a member of javax.xml.namespace.

I'll call it qName, staring with a lowercase q. And I'll instantiate it by wrapping it around the name of
the attribute I'm looking for, and that's customer.id, so now I have my qname object. The next step is to
retrieve the attribute value, and I'll do that by calling a method called getattributebyname. That's a
member of the start element class. I'll create a string called idAsString and I'll get it's value by calling
se.getAttributebyName and I'll pass in the qname object.
That returns and instance of the attribute class. And I wanted string value, so from there I'll call dot get
value. And then finally, I'll take that value and parse it as an integer and pass it to the customer object.
So handling of attributes is a bit different for the event reader than it is for the screen reader. Now for
all of the other values, you can leave the code as is. Because the XML event reader has the same get
element by text method. As XML stream reader. There are other ways of getting that text.
For example, you can get the next event and then retrieve the data directly, but that code just isn't
necessary. I'll clean up my code by removing unneeded imports, then I'll save and I'll come back to my
main class. Read XML with StAX Events. Be sure you've opened this main class that's using the
StaAX Event reader class I was just working on, and when I run the code, I'll see that I'm successfully
retrieving data. Now again, the event reader and the stream reader are very similar to each other.
There aren't significant performance benefits of one over the other. Although if there is, it would be that
the stream reader creates fewer objects in memory. But they're both very fast, and because they're
streaming, they're able to discard objects that they no longer need as they read through the file from
beginning to end. They're both very memory efficient and very fast. And so which you use, the event
reader or the string reader, is really a matter of coding preference.

Parsing XML in Android with XmlPullParser


The StAX API is fast and easy to work with, but it's not available easily to android developers. The
android developer kit doesn't include an implementation of StAX and the rules of android say that, any
classes that are members of a package starting, with java X, can't be easily added to an android project.
But specifically for parsing XML files, there is an alternative, it's called the XML Pull Parser. It's based
on the API that's defined at this website, xmlpull.org and you'll find the documentation for this class,
and its associated classes in the Android documentation, at this webpage, on developer.andorid.com.
You'll find all the details, and some sample code here. The XML Pull Parser coding style, looks very
similar to the StAX style. You, start with the factory and from there, you create a parser, then you
create and input stream, and add that string to the parser. In this example, the input stream comes from
an embedded resources, that's a part of an Android project, but you can use the XML parser as well,
with a file that's stored on a device's persistent storage, or XML content that you download from the
web.
Just like the StAX streaming parser, the XML pull parser depends on a while loop. Each time through
the loop, you're looking for an event, and examining the event type. And just like the StAX version, the
pull parser has a next method that says, go to the next event, or the next significant node, in the XML
content. The details of the XML pull parser are beyond the scope of this course, because I don't want to
get into the complications, of setting up an Android development environment.
But I already have some content, on using the XML Pull Parser, in another course. It's called Android
SDK, local data storage and it's available in the lynda.com library. Look at chapter Three, Using
Internal and External file storage, and you'll find that there is movie there, called Parsing a Read Only
XML file with XML Pull Parser. You'll find a lesson on JDOM, there as well, but that duplicates what
I've already covered in this course. So, if you want really fast parsing in android and you want to
minimize the amount of memory using a streaming parser, try the XML Pull Parser.

Part 6: Creating and Parsing XML


with JAXB
24m
Comparing XML binding with other programming models
JAXB stands for Java Architecture for XML Binding. It's a relatively recent API for XML processing.
And like DOM, SaX and StaX, it's included in the Oracle Java SE distribution. It's a part of Java SE 6
and above. And just like StaX, there are multiple implementations available. JAXB is both a reading
and a writing API. You can use it to both create and parse XML content.
The programming architecture is based on creating annotations in plain old Java object classes. To map
the elements of those classes, the properties, or fields to element or attributes to the XML content. The
actual amount of procedural code you write to process XML with JAXB is very small. All of the logic
is in the annotations. JAXB is not available for the Android SDK. That's because it's a memory
intensive API and not really appropriate for resource constrained devices.
However, if you like the annotation based syntax of JAXB, take a look at Simple also known as the
simple XML serialization framework which I described in a later chapter. That's available at
simple.sourceforge.net, as I described you use annotations in POJO classes to determine the structure
of your XML and map your classes to XML content. This is made easier with the schema compiler
that's included in Java SE.
I'm not actually going to use the compiler in this course. I'm going to show you how to hand code your
POJO classes. Because that will give you a better understanding of how the annotations map to XML
structure and behavior. Here's some examples of annotations you can use in Pojo. You can place
annotations before class declarations, before fields that are members of a class or before getters and
setters. Here's an example of two annotations for a single example.
The XML root element annotation means that this class represents the root element of an XML
document. And the XML type annotation is used to indicate what the name of that element should be.
And what the order of properties should be as the private fields of the class are mapped to child
elements of the root element. You can handle mapping of child elements very simply, with the XML
type annotation. That handles the name, phone, and about values in this example and turns them into
child elements of customer.
But to annotate an attribute, you need to add the annotation before the setter or getter method of a

particular value. This is called annotating the property. When you add the XML attribute annotation
before the getId method, that means take this value and represent it as an attribute of the current
element, rather than as a child element. You can also define a collection or a list of data objects by
creating a separate class that has a field representing a list of data objects.
And then you annotate the root element of that class. For example, in this code, I have a class named
customers. And it has a private field, which is a list of customer objects. There is an XML root element
down notation, which says that the equivalent element in the XML file is named customers are lower
case. And then an XML accessor type annotation, that tells JAXB to get the child elements of the root
element from the fields of the class.
Then there's an XML element annotation above the list. And it's saying that each object within the list
should be named customer in the XML file, and that the data type to use in Java is the customer class.
This code will all become much more clear in the exercises as you see how it translates to both creating
and reading XML content. Once you've annotated your classes, the actual amount of code it takes to
create a read XML is very small. Here's an example of creating XML content.
First you create an instance of a class called JAXB context, and then create a marshaller object. JAXB
refers to serializing XML as marshalling it. You create the Marshaller object and you set its properties.
Then you indicate where you want to create the XML. You can set your target as a file object, an output
stream, or a number of other types of objects. Then you call the Marshall method, you pass in your data
object and your target, and the work is done for you.
Similarly, reading XML with JAXB takes just a few lines. Once again, you create the context object,
and this time an unmarshaller the object. Then, indicate where the data is, either a file, an input source
or some other source, and you past that into the unmarshall method. And you get back the mapped
objects. So, unlike the tree-based or the streaming APIs, there's no endless looping, there's no
examining of elements, you simply say, give me the data.
And all the logic is in the POJO classes annotations. Some things to watch out for in JAXB, include the
fact that JAXB stores the entire document in memory all at the same time. And so just like DOM and
JDOM, large XML content can cause memory problems. Also the annotation model is completely
unique to JAXB. So once you start down the road for using JAXB in a particular application, it's pretty
tough to go to another API without having to completely rewrite your logic.
But all that being said, you might find that JAXB significantly reduces the amount of code you'll have
to write to map Java classes to XML structures. So in the next few movies, I'll show you some of the
code you need to write to annotate POJO classes and then create and read XML with the JAXB API.

Annotating POJO classes for use with JAXB


In order to create or read XML with JAXB, you'll add annotations to POJO classes. Because I'll be
using POJO classes to represent data that I've retrieved from a data set, and to export to XML. I've
copied all of the data provider code over to this project. This project, JAXBAnnotations, has its own
copy of customer.java, and it's own copy of the data provider class that imports data from json. It also
has it's own copies of the data files, customers.json and customers.xml that I've been using throughout
this course.
And in fact, I've completely decoupled this project from the data provider project, which is now closed.
It's completely independent. To get ready to export to XML, start with a pojo class. I'm working with
the class customer.java. You'll actually need two classes to represent a list of data. One class that
represents an individual data instance. That's this customer class, and another class that's a container for
the list. And I'll show you how to create that in a moment.
Starting with the data entity class, in this case customer. I'll begin by adding an annotation above the
class declaration. It's named XmlRootElement. Type the beginning of the annotation name. Then press
Ctrl+Space. When it's auto completed, an import statement will be added for it. All the annotations are
members of this package, Javax,xml.bind.annotation. Next add an annotation named
XmlAccessorType.
The accessor type determines how jaxb will get the data it needs to write an XML file. You set it with a
constant. Thats the member of the class, XmlAccessType and can be one of these values, field none.
Property or public member, if you set the value to public member, JAXB will look for public fields of
the class. My Pojo class doesn't have any of those, another alternative is to use field and then Jacks be
would get it's data from private fields of the class. Or you can use property.
And then JAXB looks for pairs of setters and getters, and that's the one that I'm going to use. So now,
for each setter getter pair that's in this class, such as get and set ID, get and set name, and so on, a child
element of the customer will be created. And the data will be drawn from the current data instance, as
JAXB reads the data. Finally, you can control the order in which the data is written. You can do that
with an annotation called XmlType.
The XMLType annotation has a couple of properties For example you could use this one to indicate the
name of the XML element that will be mapped to this class. You don't need that in this example. It's
going to go in another place in the code in a moment. But I am going to show you how to create a value
called, propOrder. This controls the order of the properties. Type propOrder, then the equals assignment
operator, and then a pair of braces.
Within the braces, add the names of the properties in the order in which you want them to be shown in
the XML file. Don't add anything that's going to be represented as an attribute. That will come next. I'll
start with the name property, then phone, then about, then age, then balance, and I'm running out of
space, so I'll expand to full screen. Then active, and finally, joined. If you don't add the propOrder here.

The JAXB api will order them alphabetically, and this way, I'm controlling the property order
explicitly. Finally, indicate which attributes you want to represent. You can do this using an annotation
called XML attribute, which you can place above either the getter or the setter method for that
particular property. And now, the customer class is ready for marshaling, or serialization to XML. But
that's just the first step. The next step is to create a class that will contain a list of customer objects.
I'll go to my model package and create a new Java class. And I'll name it customers. It's a plain old java
object, so it won't extend any other classes. I'll click Finish and then, I'll add a private field, which will
be a list of customer objects. As you type in the code, be sure to add required imports. After finishing
the declaration. I'll use a quick fix, and create a getter and a setter for the list object, and now I'm ready
to add annotations for JAXB.
Above the class declaration, I'll add two annotations. The first will be XmlRootElement. And I'll pass
in a setting that says the name of the root element of the XML file will be customers. Then, I'll indicate
where the data will come from, for the customer's element, and I'll do that with the XmlAccessorType
annotation. In the customer class, I set the XmlAccessorType to property, using the getters and setters.
In the customers class, I'll set it to XmlAccessType.FIELD, just to show you another approach. Then,
finally, I need to tell JAXBY how to get the data from the list of customers, which class to use to
contain the data.
And I also need to indicate what I want to name each customer element in the XML file. I'll do that
with an XML element annotation that I'll place above the field. First I'll set the name of the Mat
element, that'll be customer, and then I'll indicate which class represents this data. I'll choose customer
And then, add .class and then as JAXB is looping through the list, it will know what type of data its
dealing with and that's all the annotations you need.
The top level class is customers, it represents the root elements of customers in the XML file and get its
statement from its private fields. The private field customers has instances of customer, and each item
in that list will be represented by an XmlElement named customer. In the customer pojo class, I have an
XmlRootElement annotation again. I'm indicating that I'm getting the data from properties, the setter
and getter methods And I've explicitly set the property order and finally I've indicated that the ID will
be represented as an attribute that placing the XML attribute annotation above either the getter or in the
centre, now you ready for the final step actually creating an XML file and I will show you that little bit
of code in the next movie

Creating XML from annotated classes with JAXB


Once you've added annotations to your plain old Java object classes, you're almost ready to use JAXB
to create XML. I'm working in the project JAXBCreateXML. And these classes already have the
annotations, that I described in the previous movie. Both the customer and the customer's classes have
the required annotations. And I will add code now in my main class, CreateXMLWithJAXB. This class
has a main method and all it's doing so far is retrieving data from the data provider class.
Remember this is a version of the data provider class that's a part of the current project and we are not
using the data provider project in these examples. The first step to get ready to output to XML with
JAXB, is to take your list of data objects and wrap it inside the class that you prepared for that purpose.
That's the customers class that has the annotations. So, I'll create an instance of that class. Be sure to
add required import as you go along. I'll name the object customers and I'll instantiate it with the no
arguments constructor.
I should mention that every class that you use with JAXB must have a no args constructor or no
explicit instructors in which case the compiler would generate a no args constructor for you. If you only
have an explicit constructor with one or more methods, JAXB will fail. Next, I'll take my data object,
my list of customers, and wrap it inside my customers object. I'll use the setter method. SetCustomers
and pass in the data object.
So now, I have a collection of data that's ready for use by JAXB and I'll create an instance of class
named JAXBContext. I'll name it context and I'll get its reference by calling a static method of that
class. Using JAXBContext.newinstance. When you call this, you need to indicate which class is going
to be used. You only need to indicate the top level class in this example, the customers class.
And you do it by passing in its class property with Customers.class. So, now my context knows which
class its going to use. And now I can create a Marshaller object. The Marshaller object knows how to
take this data and output it or Marshal it as XML. The name of the class is Marshaller and it needs to be
imported and I'll name it marshaller and I'll get its reference from the context object with
context.createMarshaller.
Next create a target object. You can use a file, a writer object or a number of other data types. For my
first example, I'll use a Java StringWriter that I'll name sw and I'll instantiate it with its no arguments
constructor. And now I'm ready to output to the string. I'll call a method of the marshaller object named
marshall. When you call the marshall method, you'll pass in two arguments. A data set which is a plain
old Java object with JAXB annotations noted here as a JAXB element and a target.
And you'll see that you can use files, output streams, results, writers and you can even use classes that
are members of the StaX API, XML event writer and XML StreamWriter. I'm going to keep it simple,
I'm going to use a writer object, so I'll choose this version of the method. And for my JAXB element,
I'll pass in customers and for my writer, I'll pass in my StringWriter object and then Ill see what
happens. Ill output the string value to the console using system output and Ill output the StringWriters

to string method.
Before I can test, I need to deal with some potential exceptions. I'm going to select these five lines of
code, that create the context, the marshaller, the target and then output to XML and I'll wrap those in a
tri cache block. And I only get one cache segment for the class JAXBException. I get rid of this
comment, and now I'm ready to test. I'll save and run this main class, and there's my XML. Notice that
it's compacted together and if you're trying to create XML to output over the web, you might want to
leave it in this form.
If you want to format it though with indentation. You just need one more line of code. Place the cursor
after the line that's creating the marshaller object and call the set property method of the marshaller
object and pass in the following. For the property name, use JAXB_FORMATTED_OUTPUT and for
the value pass in true. Save your changes and run the code and now you get well formatted XML with
indentation. You can experiment with some of the other properties to see how you can change the
indentation and change the XML and coding.
Finally, let's see what happens when you output to a file. I already have a file called customers.xml in
my output folder, so I'm going to delete it. And then I'll come back to my code and I'll add code here.
I'll create an instance of the Java file class that I'll just name f and I'll instantiate with its constructor
method. And I'll pass in a literal string of ./output/customers.xml, then I'll call the marshaller object
again, with marshaller.marshall.
Once again, I'll pass in customers as the object I'm marshalling, and this time I'll pass in the file object
as the target. I'll save and run again. I get the output but now I'll go back to the package explorer, I'll
refresh and there is the file that I just created. There's a lot more to learn about what's possible with
JAXB, including creating CDATA sections, dealing with namespaces and prefixes and many other
advanced capabilities.
But this is enough information to get you started creating simple XML files with JAXB's marshaller
and context classes.

Parsing XML with JAXB and annotated classes


Once you've added JAXB annotations to a set of POJO classes, it's a simple matter to read an XML
file. There are some things to know before you get started with the code. First, if you're dealing with
dates, JAXB has a particular date format that it expects. And it's the format that I've been using
throughout this course in the joint element of my customer data. JAXB looks for this format with a year
in four digits, a month and a day separated by hyphens, the uppercase T and then the time, in hour,
minute, second format.
It doesn't have explicit support for milliseconds. If you need to deal with other date formats, you'll have
some more coding to do. And the same thing is true for other advanced data types, but if you're
working with a plain vanilla XML file, the amount of code you need to write is miniscule. I'm going to
work with the class ReadXMLWithJAXB.java. The customers and the Customer class are using exactly
the same JAXB annotations that I used to create XML.
In ReadXMLWithJAXB, I'll add the following code. I'll start with a JAXB context object, which is the
same object I used to create XML. I'll create the object and I'll name it context and I'll instantiate it
using the new instance method. Just as I did when I created XML, I'll pass in the class object for
customers, with Customers.class. Next, I'll create an instance of a class called unmarshaller.
Remember that I used the marshaller class to create XML. I'll use unmarshaller to read it. And just as I
did with the marshaller, I'll get this reference from a method of the context object. Next, I'll indicate
where I'm getting the data from. The unmarshaller object has a method called unmarshal, and there are
a number of versions of it. One version knows how to deal with the Java file class. So, I'll create a file
object that I'll name f, and I'll instantiate it with the constructor method and I'll pass in the name and
location of the XML file.
I'll pass this in as a literal string of ./data/customers.xml. And now, I'm ready to read the XML file, and
I'll do that with a single statement. I'll create an instance of the customers class that I'll name customers.
That's my top level class that will contain my list of data, and I'll get that data by calling the
unmarshaller object unmarshal method. And again, you can use this with many different sources, a file
object, an input source, a node, a reader, and classes that are members of the StAX API.
I'm using this first version, so I'll pass in my file object. Now, the unmarshal method says that it's going
to return a plain old Java object but I know I'm getting back an instance of customers because that's
what I said I wanted up here when I called new instance. So, I'll use a quick fix and I'll add a cast to
this expression. Now the XML file has been read, and I'm ready for the next step, retrieving the data
from the wrapper class and outputting it to the console.
I'll create a variable which is a list of customer objects. I'll name it data, and I'll get its value from the
getter method of the customers object. And then I'll loop through the results, and output them to the
screen. As I've done in previous exercises, I'll use a for each loop. For each customer inside the loop,
I'll use the customer object to string method by passing the object into System.out.println. I'll do with

some potential exceptions.


I'll go to one of the lines that shows that there's an exception and I'll use a quick fix and this time I'll
use a throws declaration to handle the JAXB exception and that takes care of all the possible
exceptions. I'll save my changes and I'll run the code and there's the result. I've successfully read a
whole data set into memory. When you use JAXB, you're creating a tree of objects in memory, just like
DOM or JDOM. So, while JAXB has a very easy coding model, it doesn't deal with very large data sets
as well as, say, SAX or StAX, which are streaming APIs.
But, when you're dealing with small or medium-sized XML files, you may find that JAXB offers the
perfect combination of coding ease and maintainability.

Part 7: Creating and Parsing with


Simple XML Serialization
21m
Comparing Simple to JAXB
When developers referred to the simple library for XML, they might be referring to the simple API for
XML that I described earlier in the course, or they might be referring to a much more recent API called
the Simple XML Serialization framework. This is an independent open-source project. It's not included
in either Oracle's jdk or the Android sdk. But you can download it for free from simple.sourcefogce.net.
This simple API lets you both read and create XML content using an annotation based model. That's in
many ways similar to JAXB, but much lighter weight both in memory usage and in the amount of code
you have to write, and unlike JAXB it works okay on android. You do have to add another JAR file to
your Android app which increases the size of the app, but if you like the coding style it might be worth
while. The annotation code looks very different than JAXB, and is somewhat simpler.
In simple, you place annotations before class declarations or fields. In this example, I'm saying that the
customer class has annotation of root, which means that it represents the root element. Then I'm saying
that the ID will be stored as an attribute and the name value will be a child element. Just as in JAXB,
you can define a collection class. Create a separate class with a private field representing the collection.
Then annotate the class as a root element and the children as a list.
In this example, the Customers class is the root element and the customer's list within the class will
contain the child elements. That List field is annotated with element list and a property of inline set to
true. And, I'll describe what that does when we get into the demostrations. Once you've done your
annotations, its takes just a few lines of code to either create or read XML content. In this example, I'm
creating an instance of my Customer's class that has the top level annotations and passing data into it.
Then I'm creating an object called a serializer. This is a class that's a member of the simple library.
Then I'm defining a target where I want to create my XML. Just as with other Java based API's for
XML, you have some options. The serializer objects write method in this example is using a file writer
wrapped around a file. You can also use an output stream. And once you call the write method, the
XML content will have been created.
Reading XML with Simple is just as easy. Once again create a serializer then create a source object, in
this case a file, and then read the content. And all of the logic for how to interpret the XML is handled
by the annotations in the POJO classes. Just like JAXB, Simple is a binding API and stores the entire

document in memory. And, so if you have to deal with larger documents, it can cause JAVA to run out
of memory. Also, as with JAXB, the annotation model is completely unique to this library.
So, if you decide to use it in a particular application and then later on decide to change the API you
want to use, you'll need to completely rewrite your logic. And finally, as I mentioned you do need to
add the jar file to your application. This can particularly be an issue for Android. The simple jar file is
about a half a megabyte, and whether the additional size in your app is worth the programming
convenience is a matter for you to decide. So, in the next few movies, I'll show you how to get started
coding with this last API of the course.
The simple XML serialization framework.

Annotating POJO classes for use with Simple


The Simple XML Serialization library is a third party open source library. And to use it, either an
Android or any other environment, you'll need to download it from the website. You'll be downloading
a zip file. Extract it anywhere on your hard disk. I've extracted it to my desktop. And then, open up the
resulting folder, go to the Jar Sub Folder, and copy the Jar File. I'm using version 2.7.1. You should be
able to use this or any other later version.
I've also included this Jar File in the course's exercise files. I'll copy the Jar File to the clipboard, then
go to Eclipse. And I've opened a project named simple annotate from the Exercise Files. I'll go to the
Project Libs Folder, which already has a copy of the J SON simple library, and I'll paste this library into
place. And then I'll add it to the project's Build Path. And now all of its classes and annotations are
available to me.
The first step in reading or creating XML with Simple, is to add annotations to the POJO classes. This
is pretty much the same model as with JAXB. But the names of the annotations are a little different.
And in fact, the coding is a little bit simpler. I'll start with the customer class, which represents a single
customer object. Begin by adding a root annotation above the class declaration. Be sure to import it.
This and all of the other annotations in the simple framework are members of the package
org.simpleframework.xml.
You can use the Root annotation in it's simple form, as long as the name of your class matches the
name of the Root element of the XML file. It will be changed automatically to all lower case, as it's
translated to XML. But, if you want to explicitly indicate the name of the Root, you can set a property
called name. It will look like this. Name equals and then the name of the element as a literal string.
Next, add annotations to each of the private fields of the class.
For each field, choose between an attribute and an element. In my XML structure, the id is represented
as an attribute, so I'll add an Attribute annotation. Once again, be sure to import it. For all of the other
fields, I'll use an Element annotation. Again, be sure to use the right import. After I've added that one,
and made sure that I've added the import, I'll copy that line of code to the Clipboard, and paste it in,
above each of the fields.
Each of these values will be saved as a text node automatically in the XML file. For any value that you
want to wrap inside a CDATA section, add a property to the Element annotation. It's called data. And
you'll set it to a value of true. And that means you want to treat the following field as the CDATA
section instead of a simple text note. And that's it, for the data element. I've added the root annotation
above the class. And added an Attribute or an Element annotation above each private field.
I'll save those changes and next, I'll move to my Customers class. The customers class looks exactly the
same way it did with JAXB. It has a private field, representing a list of customer objects, it's named
Customers. And then, a public setter and getter. Just as I did with the customer class, I'll create a Root
annotation above the class declaration. I'll be sure to import it and I'll add the name property. And in

this case, I'll set it to Customers.


Next, I'll annotate the private field, the list of customers. I'll use an annotation called Element List. I'll
be sure to import it and save the change. That's acceptable, but if you leave this annotation the way it is,
your nesting of the XML will be too deep. You'll get a Customer's Element inside a Customer's
Element. To indicate that you want this list to be the next level inside the Root element of the XML file.
Add a property called inLine, set to a value of true.
And now, that will take the List of Customers and make it a child of the root. And that's all the
annotations you need. So to review. The top level class which contains my list of data, has a Root
annotation above the class declaration and a name property indicating the name of the equivalent XML
element. And then the list which is the private field has the Element list annotation with the setting of
inLine equals true. And that means that this list will be the next level inside the XML right under the
Root.
The Customer Class has a Root annotation once again with the name and then each of the fields has
either an Attribute or an Element annotation. And for those fields that you want to represent as CDATA
sections, you add the data equals true setting. So the annotations are complete and you're now ready to
read and write XML, and we'll get to those steps in the next set of movies.

Creating XML from annotated classes with Simple


Once you've added annotations to your POJO classes you're ready to create and read XML with the
simple framework. I'm working in a project called SimpleCreateXML. And I'll open the class
CreateXMLWithSimple that's available in the create package. This class has a main method. And right
now, it's just retrieving data from a JSON file and asking for a small data set that is 10 records. The first
step is to wrap the data set inside my customer's class.
This is the class that has the annotations. It has the private field, the list of customer objects, and a
setter and a getter. I'll create the instance of the class, and I'll instantiate it with it's no arguments
constructor. Be sure to import the class. Next, I'll pass in the data by calling this setter method. With
customers.setCustomers, and I'll pass in the data object. Now I'm ready to serialize. JAXB called this
step marshalling XML, in simple we call it serializing, and so we'll create a class called a serializer.
It's a member of the packageorg.simpleframework.xml and you'll instantiate it from a constructor of a
class called persister. Next create a target. You will be able to serialize to a string, to a file, and to a
variety of other targets. I'll create a StringWriter object that I'll name sw and I'll instantiate it from the
classes no arguments constructor. Now, to write the XML, call a method of the serializer object called
write.
Your available targets are files, output nodes, output streams, and writers. I'll choose this last version.
The first argument is my customers object that contains my data and the second is the StringWriter.
And then I'll show the result by using system output. And I'll output the StringWriters to string method.
I'll re-organize my code a little bit, and I'll deal with the one possible exception. I'll select these two
lines of code and surround them with a try-catch block and I'm ready to test.
I'll save and I'll run the code, and there is the result. I'm outputting a well-formed XML packet. I
indicated that I wanted my about elements data to be wrapped in a CDATA section, and I can see that,
that's happened successfully. You might also notice that the joined elements date format is different
than the date format that's supported by JAXB. In JAXB you get the date and the time with a T in the
middle. In the simple frame work you get a slightly different format.
I will show you how to deal with custom dates when we get into reading XML with simple. But this
should be enough to get you start it with simple creating XML strings. Or by simply changing the target
object writing XML to a file.

Parsing XML with Simple and annotated classes


To read the XML with a simple framework. You can use the same annotated classes, that you used to
create XML, using the root, attribute and element annotations in a Pojo class representing a single
instance of data and the element list annotation, in a class that wraps a list of data. I'm working in the
project. SimpleReadXML and I've opened the class readxmlwithsimple.java. In the main method, I'll
add the following code. First, I'll create an instance of this serializer interface.
This is the same interface I used to create XML. And I'll instantiate it with new persister. The persister
class has a number of constructor methods. For the moment, I'm going to use the no argument's
version. I'll be coming back to this and changing it later. Next, I'll indicate where I'm getting the XML
from. The persister can read a file and input stream or a number of other types of sources. I'll create an
instance of the Java file class and I'll name it source and I'll use it to reference a customer's dot XML
file which is in this projects data folder.
With ./data/customers.xml. As always in Java, use forward slashes, regardless of whether you're
working on Windows or Mac. Now, I'm already ready to read the XML file. When I get the data back,
it'll appear as an instance of the Customers class. That's the wrapper class I'm using to contain my list
of data. So, I'll create an instance of that class. I'll use the customers class as the data type.
And I'll get its reference by calling this serializers read method. There are a number of versions of the
read method available. I'll use the first one that's on my list. The first argument indicates the data type
that's being returned from the read method. And I'll pass in customers.class. And the second argument
is the source of the data. And I'll pass in the source object that points to my XML file. I get back a
reference to a customer's object and it contains my list of customer objects.
So, I'll extract that data. I'll create a Java list, make sure to import the list from Java util, and the type
that it contains is customer. I'll name that data and I'll get its reference by calling the getter method of
the customers object. With customers.getcustomers. Then, as I've done in previous exercises. I'll use a
for each loop and just output the data from each customer to the console with system output passing in
the customer object and that result in calling the two string method of the customer class.
The read method might throw an exception, so I'll move that cursor up to that line. And use a quick fix,
and add a throws declaration to the current method. For a simple XML file that doesn't have any
complex data types, you're done. When I try to run this code, though, I'll get an error. And the problem
is that one of my values is a date. And it's rendered in the XML file in a format that the simple
framework doesn't understand. So, I have a little bit more work to do.
I'm going to create a new class to handle that format. It's called a transform class, and it's designed to
let you handle any data format in XML. And transform that string into whatever data type it should be.
I'll create this new class in My Read Package. So, I'll go back to My Package Explorer view, right-click
on the Read Package, and create a new class. I'm going to name this class, dateFormatTransformer. It's
not going to extent any special classes, but it is going to implement an interface named Transform.

Be sure to choose the transform interface. It's the one in the package or dot
simpleFramework.xml.transform. For the moment, don't create the inherited abstract methods. You'll
handle that after you've created the class itself, and I will show why. When you create the class you
need to indicate the data type that's being transformed to. This is your target data type, you'll receive
data as a string and return it as a Java type, so I am going to pass in date.
Use the version of date from java.util. Now that you've indicated what data type you're working with,
you're in a better position to implement the abstract methods. Move the cursor up to the class
declaration, and press Ctrl+1 on Windows or Cmd+1 on Mac, for a quick fix and add unimplemented
methods. And you'll see that the correct data type is added to these methods automatically for you. The
next step in creating this class is to add a private field, a date format object, that will indicate the format
that's being used to transform this data.
I'll create the date format object as a private field. And I'll name it DF. Then, I'll create a constructor
method for the class. The constructor method will receive an instance of this date format object. As the
constructor is called, it'll save the value that's passed in as an argument to the private field. Now, I'm
ready to implement the read and write methods. The transform objects read method will be called
whenever a string is encountered in the XML file and it has to be turned into a native type, how return
df.parse and I'll pass in arg0.
You can rename the argument to something more meaningful if you like. And then for the right method.
I'll be receiving a date object and I'll do the formatting in the other direction and I'll return df.format
and I'll pass back arg0 which is the date. And again you can rename that argument if you like,so that
class is now complete again it's a implementation of the transform interface. Which has the two
required methods, read and write, to transform in both directions.
Now I'll go back to my main class, and I have a few lines of code to add there. I'll place the cursor
inside the main method before I've create the serializer. And I'll create a new date format object. Using
the simple date format constructor and passing in the XML date format string. Be sure to import both
the date format and the simple date format classes. Next create an instance of a class called Registry
matcher, which is a part of the simple frame works transform package.
We'll name this simply m and instantiate it with the classes no args constructor. Then, use a method of
the registry matcher called bind. The bind method takes two arguments, the class that you're going to
transform and an instance of your custom transformer. I'll pass in Date.class. Be sure to import Date
from java.util. And then for the transformer, an instance of your transformer class. Using that
constructor method that we just created.
And pass in the date format object. So, here are these three steps again. Create the date format object,
create the registry matcher, and bind everything together, by passing in the date class, and the instance
of your transformer. There's one last step and that's to pass the registry matcher into the persister object.
And, you'll use this version of the constructor method. I'll pass in m, the instance of the registry
matcher. So, its a little bit of extra code to handle these special date formats but now when you save
and run your code, you should successfully parse the XML file.

And the code is constructed in such a way that if your date format changes in the XML, you have only
one place where you need to make the change in your main class. The DateFormatTransformer class is
written for flexibility. So, that it can accept any date format object. So, again the goal with a binding
API such as simple is to reduce the amount of code in your main class and use annotations. One of the
great things about this simple framework compared to jxp.
Is that as an open source library, you can include, it in your Android apps. And if you're wondering
about the impact it might have on the size of your applications, take a look at the file, simple dot xml,
and you'll find that it's about a half a megabyte. Not large at all. And provided enormous functionality.

Next steps
Thanks for joining me on this tour, of Java-based APIs for processing XML. Where you go next
depends on what you want to do with XML. If you're interested in XML as a way of storing persistent
data, you might be curious about how databases work with XML. And you can find some information
about that in the course, Foundations of Programming, Databases. If you're interested in web services,
working with Soap or other flavors of XML, check out the course Foundations of Programming Web
Services. And if you're an android developer.
You can find more information about XML on Android, including the use of the XML pull parser API,
and other ways of storing and working with data on Android devices in the course Android SDK Local
Data Storage. However you decide to put Java and XML to work, I hope that this course has helped
you get started, choosing the right API and learning how to put it to work in your Java Application

You might also like