You are on page 1of 58

}w!

"#$%&123456789@ACDEFGHIPQRS`ye|

M ASARYKOVA UNIVERZITA
FAKULTA INFORMATIKY

Comparison of JPA providers


and issues with migration
D IPLOMA THESIS

Luk s Sembera
a

Brno, June 2012

Declaration
Hereby I declare, that this paper is my original authorial work, which
I have worked out by my own. All sources, references and literature
used or excerpted during elaboration of this work are properly cited
and listed in complete reference to the due source.

Luk s Sembera
a

Advisor: Ji Pechanec, Red Hat Czech, s.r.o.


r
ii

Acknowledgement
I would like to thank my technical advisor Ji Pechanec from Red
r
Hat Czech for his valuable comments and suggestions. I would also
like to thank my anc e Daria for her support during writing.
e

iii

Abstract
This thesis aims to compare three implementations of the JPA standard specically Hibernate, OpenJPA and EclipseLink. Except the
comparison, it will also describe the migration processes of various
real-world applications between those JPA implementation and document the issues that the developers might typically run into.
The practical part involves developing an application which would
provide a support when migrating projects between those three JPA
providers.

iv

Keywords
JPA, JPA2, Hibernate, OpenJPA, EclipseLink, Java, persistence, relational, databases, Scala

Contents
1

Introduction . . . . . . . . . . . . . . . . . . . .
1.1 Database management systems . . . . . .
1.1.1 Relational databases . . . . . . . .
1.1.2 Object-oriented databases . . . . .
1.1.3 NoSQL databases . . . . . . . . . .
1.2 Object-relational mismatch . . . . . . . . .
1.3 Brief history of Java persistence solutions
1.3.1 JDBC . . . . . . . . . . . . . . . . .
1.3.2 EJB 2.x entity beans . . . . . . . . .
1.3.3 JDO . . . . . . . . . . . . . . . . . .
1.3.4 myBatis . . . . . . . . . . . . . . .
1.4 JPA . . . . . . . . . . . . . . . . . . . . . .
1.5 Goals of the thesis . . . . . . . . . . . . . .
Comparison of JPA providers . . . . . . . . . .
2.1 Methodology of the comparison . . . . .
2.2 Identier generation . . . . . . . . . . . .
2.3 Performance . . . . . . . . . . . . . . . . .
2.3.1 Batch inserts . . . . . . . . . . . . .
2.3.2 Searching by ID . . . . . . . . . . .
2.3.3 Basic JPA QL test . . . . . . . . . .
2.3.4 Basic criteria API test . . . . . . . .
2.3.5 Aggregate function . . . . . . . . .
2.3.6 Performance summary . . . . . . .
2.4 Type conversion . . . . . . . . . . . . . . .
2.5 Caching support . . . . . . . . . . . . . . .
2.6 Entity lifecycle and transactional events .
2.7 Schema generation . . . . . . . . . . . . .
2.8 Support for stored procedures . . . . . . .
2.9 Integrating with other frameworks . . . .
2.10 Licenses . . . . . . . . . . . . . . . . . . .
2.11 Documentation quality . . . . . . . . . . .
2.12 Build systems . . . . . . . . . . . . . . . .
2.13 Summary . . . . . . . . . . . . . . . . . . .
Experimental migration of JPA applications .
3.1 Migrating from Hibernate . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

6
6
7
7
8
8
9
9
10
11
11
12
12
13
13
14
15
18
18
19
19
20
20
21
22
24
25
27
28
29
30
31
31
33
33
1

3.2 Migrating from OpenJPA . . . .


3.3 Migrating from EclipseLink . .
3.4 Migration summary . . . . . . .
4 Automatic migration tool . . . . . .
4.1 The application architecture . .
4.2 Java source les parsing . . . .
4.3 Ideas for a further development
5 Conclusion . . . . . . . . . . . . . . .
A Generated database schemas . . . .
A.1 Hibernate . . . . . . . . . . . . .
A.2 OpenJPA . . . . . . . . . . . . .
A.3 EclipseLink . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

35
36
38
39
39
41
43
45
46
46
48
50

Listings
1.1
2.1
2.2
4.1

Sample of JDBC code . . . . . . . . . . . . . . . . . . . .


DDL dening sample database schema . . . . . . . . .
Sample stored procedure . . . . . . . . . . . . . . . . . .
Recursively searching the abstract syntax tree for vendorspecic annotations using Scala pattern matching . . .
A.1 Hibernate-generated sample database schema . . . . .
A.2 OpenJPA-generated sample database schema . . . . . .
A.3 EclipseLink-generated sample database schema . . . .

9
16
27
42
46
48
50

List of Figures
2.1

ER diagram of sample database schema

15

4.1

Class diagram of the migration application

44

List of Tables
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8

Batch inserts on PostgreSQL test results 18


Batch inserts on MySQL test results 18
Find by ID test results 19
Fetch all users using JPA QL test results 19
Fetch all users using criteria API test results 19
Complex join using JPA QL test results 20
Complex join using criteria API test results 20
Feature matrix 31

1 Introduction
Every application, except the most basic ones, has to deal with data.
The very rst computers were designed as black boxes receiving input, doing some calculations and producing output. Since then, computers have become much more complicated and nowadays they do
much more than such simple data processing. Nevertheless, they still
operate with data stored on some kind of a permanent storage device, such as hard drive.
Input data for an application could be saved, without much thinking, into an ordinary text le. However, such les are next to impossible to machine process because they do not follow any rules which
would describe their structure. For this reason, variety of rules the
data have to follow are often introduced (e.g. the structure is described by XML with an appropriate XML Schema denition).
Even if the data are in easily computer-readable form, the biggest
problem with this le-based approach remains. It is still just a text
le and, therefore, the data access is limited by I/O operations of the
operating system. Demands of current enterprise applications, however, go far beyond the possibilities of such le-based persistence. We
require reliability, transaction management, high-performance concurrent access, advanced user access control and much more. To support all of these advanced features, database management systems
have been invented.

1.1

Database management systems

A database management system (DBMS), as dened in [1], is a software designed to assist in maintaining and utilizing large collections
of data. Each DBMS has its model, which describes data, data relationships, semantics and consistency constraints[2]. It is basically a
theoretical foundation, upon which database management systems
operate.
During last few decades, several database models have been invented. In 1960, IBM introduced their database management system
IMS, which internally uses hierarchical database model. Hierarchical
model stores data in records, which are connected with each other
6

1. I NTRODUCTION
through links, creating tree-like structures [2]. An evolution of the
hierarchical model is the network model, which allows records to be
connected in arbitrary graphs and thus making data modelling mode
exible (e.g. allows many-to-many relationships between records).
Even though hierarchical and network databases exist and are still in
use1 , the models have many aws (further discussed in [4]), which
make their usage in certain scenarios particularly complicated.
1.1.1 Relational databases
In 1970, E. F. Codd published a revolutionary paper [5], where he
laid out the concept of the relational data model, which is the theoretical foundation of relational databases. For its exibility2 , simplicity and strong but simple formal background (which allows mathematical reasoning about data) its popularity grew rapidly. A lot of
both commercial and open-source implementations exist; they are
very mature and industry-proven, relational model itself is very well
understood and documented. For these reasons, relational databases
basically mean an industry standard and their knowledge is essential
for every programmer.
1.1.2 Object-oriented databases
In last decade, under the inuence of object oriented programming,
the concept of object oriented (OODBMS) and object-relational database management (ORDBMS) systems has aroused. OODBMS allow
object graphs to be stored to the database directly and are very often integrated with the programming language itself. Thus, they provide homogeneous environment and remove the necessity of various
transformations when data are passed back and forth between application and data layer. Even though object oriented databases have
undeniable benets and advantages, their popularity is not very high.
1. Probably the best known hierarchical database is the Windows System Registry [3]
2. By exibility I mean the ability of the relational model to hide its internal data representation. Clients thus do not need any knowledge, how data are
physically stored and, therefore, are not affected when the server implementation
changes.

1. I NTRODUCTION
Not only because of those enormous amounts data that are already
stored in relational databases (and migration of which would not be
cost free), but also because of some technical issues they are still facing and which are still not yet resolved3 . Moreover, vendors of relational databases are integrating various object-oriented features into
their products and thus are making the need for pure object-oriented
databases less urgent.
1.1.3 NoSQL databases
Recently, with the rise of interest in cloud computing, a new category or databases has occurred, so called NoSQL4 databases. NoSQL
is neither a specic database model, nor an evolution of relational or
object oriented databases, but it is rather a group of database products which are suited to specic scenarios, often where other solutions fail. They often offer only a feature subset of relational databases, but they are superior in certain characteristics. For example
MongoDB is a document-oriented database, which shines at speed
and scalability, but from design decision it lacks decent transaction
management5 and, therefore, its use case is in large clusters where
transactional behaviour is not crucial.

1.2

Object-relational mismatch

Currently, most of the data are stored in relational databases. In programming languages, however, object-oriented approach predominates. It best reects the reality, models interactions of entities and
their behaviour. In object-oriented programming, there are fundamental concepts like association, inheritance or polymorphism, which
do not have corresponding counterparts in the world of relational
databases.
The object-relational mismatch occurs when data, representing
some business information we need to process, are stored in a relational database. In the application processing the data, however,
3.
4.
5.

More discussed at http://www.leavcom.com/db_08_00.htm


Abbreviation of Not Only SQL
MongoDB supports atomic operations on a single document

1. I NTRODUCTION
object-oriented approach is used and everything is modelled using
objects and other OOP concepts. Therefore, transformations are needed
each time data are passed between the application layer and the data
layer. These transformations might not be complicated if the objects
are simple data holders containing only basic data types, but once
we want to have a use of advanced OOP features, things would get
much more complicated.

1.3

Brief history of Java persistence solutions

1.3.1 JDBC
The object-relational mismatch can be tackled by hand using plain
JDBC, which is Javas API for database access. The API is databaseindependent6 and database vendors provide JDBC drivers for their
systems. Here is a very simple code which saves a person into a database using JDBC:
public void savePerson(Person p) throws SQLException {
String query = "INSERT INTO PERSON VALUES (DEFAULT,
?, ?)";
PreparedStatement stmt =
connection.prepareStatement(query);
stmt.setString(1, p.getName());
stmt.setString(2, p.getSurname());
stmt.executeUpdate();
}

Listing 1.1: Sample of JDBC code


The advantage of this approach is that it gives the programmer
full control over SQL queries sent to the database. On the other hand,
one can observe that such code is rather low-level, and pollutes our
service layer with SQL statements and checked exceptions. It also
leads to procedural code because it forces a programmer to un6. Even though the JDBC API is database-independent, the SQL statements are
not, so it is important to be careful when utilizing non-standard SQL queries.

1. I NTRODUCTION
pack primitive properties7 from domain objects and put them into
SQL statements manually. This is especially tedious and error-prone
when we work with larger object graphs and cascaded saving.
So, for reasons above we usually search for a tool or a framework,
which does the tedious work for us, analyses our classes, generates
SQL statements, automatically bounds parameters, etc. In coming
paragraphs, Ill briey discuss different approaches to the Java objectrelational mapping.
It is important to remember, however, that Java database programmers can never avoid JDBC entirely. Since all persistence solutions are built on top of JDBC, its understanding is essential to
ne-tune the persistence framework in certain scenarios or to check
logs in case that something goes wrong. As Gavin King and Christian Bauer in their book state, high-level persistence solutions are not
here for programmers who do not want to learn or do not understand
JDBC, SQL or the relational model itself. They are here for those, who
have already done it the hard way ([6]).
1.3.2 EJB 2.x entity beans
Since the very beginning of the J2EE specication there has been
a technology aiming Java persistence, called EJB entity beans. Entity beans are container managed components providing various services, such as persistence or transaction management. The specication, however, was largely over-engineered from the beginning[7]. It
builds on the fundamental concept that persistence should be nonintrusive to the application and rather be a service provided by the
container. It leads to overwhelming complexity of both the specication and applications using it. EJB entity beans were widely used
technology, but due to its complexity and general unhappiness with
the specication, companies were often forced to create various proprietary persistence solutions. Several open-source frameworks have
also been created, with Hibernate being the most widely used one.
EJB entity beans are for backward compatibility still present in the
Java EE specication, so every Java EE compliant application server
7. By primitive properties I mean properties of primitive data types, which are
directly supported by database systems.

10

1. I NTRODUCTION
has to support it. They are, however, considered as deprecated in
favour of the new JPA specication.
1.3.3 JDO
With a rising frustration from EJB entity beans, there was an attempt
to come with an alternative; with a new persistence specication,
which would work with POJOs8 and nally abandon the concept of
container-managed persistence. This specication is called Java Data
Objects. Even though JDO is quite powerful, in some aspects even
more powerful than its successor, JPA9 ; it failed to get larger popularity among developers and become the mainstream. JDO requires
byte-code manipulation to enhance persistent classes and, therefore,
is quite complicated as well. Mike Keith ([7]) also claims, that one
of the reasons why JDO has failed is its inherently object-oriented
query language, which does not play well with programmers used
to relational databases.
1.3.4 myBatis
myBatis (formerly iBatis) is a lightweight persistence framework that
gives the programmer full control over the SQL queries sent to the
database. It does not generate any SQL code; it merely maps custom
SQL statements to the properties of entities being stored in the database. Despite all the advantages and interesting ideas, it is not a fullblown persistence solution since it lacks features demanded from a
general-purpose persistence framework, such as portability across
different database systems (all the SQL code is database-specic).
Moreover, myBatis is not part of the Java EE specication, which also
means that it does not integrate with the rest of the Java EE ecosystem and, therefore, features like container-managed transaction handling, entity lifecycle callbacks or JSR-303 Bean Validation are not
supported.
8. Abbreviation of Plain Old Java Object, denoting ordinary Java classes which
do not follow any special conventions or framework rules (http://www.
martinfowler.com/bliki/POJO.html).
9. JDO for example supports non-relational data stores, whereas JPA does not.

11

1. I NTRODUCTION

1.4

JPA

EJB 3 specication, as part of completely reworked Java EE 5, released in 2006, contained a new specication regarding persistence
Java Persistence API10 . JPA was a response to users increasing frustration with the complexity of EJB 2.x entity beans. Authors of proprietary persistence frameworks and other experts were invited to
sit in groups working on a brand new Java persistence specication,
which would replace EJB entity beans.
JPA2 (included in Java EE 6, released in 2009) is an evolution of
JPA. It is based on the experience with JPA and reects users critique
(mostly about missing features which are already present in other
proprietary persistence frameworks). In this text I will only focus on
JPA2 specication and its features11 .

1.5

Goals of the thesis

In this thesis, I will:

Compare three different JPA implementations and build a feature matrix showing their strengths and weaknesses.

Take an open source project written in each JPA implementation, migrate it to the other two, test it on Oracle, PostgreSQL
and MySQL and document the issues I run into during the
migration process.

Build a migration tool, which will provide a support with migrating OpenJPA and EclipseLink projects to Hibernate.

10. In Java EE 5, the JPA specication is formally a part of the EJB 3 specication. The decision to bind them together was probably quite unfortunate, though,
because JPA is not in any means dependent on EJB container and thus works perfectly ne in Java SE environments. JPA2 is separate specication already, formally
independent of EJB.
11. From this point on, wherever I use JPA, I mean JPA2 specication. I will use
the term JPA2 only to emphasise that a particular feature was introduced in the
new JPA2 standard.

12

2 Comparison of JPA providers


JPA2 is a persistence standard for the Java platform dened by Sun
Microsystems in [8]. The JPA specication itself does not contain
any usable code; it only describes persistence concepts and provides
standard interfaces, which all standard-compliant frameworks are
obliged to implement. Reference implementation of this specication
is EclipseLink1 .
Currently there are three main implementations of JPA: Hibernate, EclipseLink and OpenJPA. I was unable to nd any reliable
statistics about their popularity and market share, but considering
that all of them are bundled with popular application servers2 and
thus are in production use, it makes sense to compare them and try
to nd out which one provides the most interesting features.

2.1

Methodology of the comparison

Comparing JPA providers is a tricky task. Since every single JPA implementation has to obey the standard and implement everything
that the standard denes, it might lead to an incorrect conclusion
they are all the same. This is up to some point true; within the boundaries dened by the standard they are all equal. However, each JPA
implementation provides features that go beyond the scope of the
JPA standard. These vendor-specic features extend the frameworks
functionality in various areas.
I have divided the features that go beyond the scope of the JPA
standard into several categories. I will go through all the categories
and describe what features does the particular implementation offer
and which possible alternatives do the others have. In the end, I will
create a summary in form of a short feature matrix summing up the
results of the comparison.
I will mainly focus on features, which are directly related to the
JPA and extend it in some way. Therefore, I will not discuss for exam1. http://www.eclipse.org/eclipselink/downloads/ri.php
2. EclipseLink is used in as default JPA provider in GlassFish, OpenJPA in Geronimo and Hibernate in JBoss AS.

13

2. C OMPARISON OF JPA PROVIDERS


ple a support of EclipseLink for non-relational data stores, because
it is not a JPA extension but merely an additional capability of the
framework.
In order to guarantee a fair comparison, I will work with latest
versions of all frameworks; at the time of writing, the latest stable
versions available are Hibernate 4.0.0, EclipseLink 2.3.2 and OpenJPA 2.1.1.

2.2

Identier generation

The JPA standard denes 4 primary key generation strategies3 (table,


sequence, identity and auto). All frameworks from the comparison,
however, provide additional ways to generate identiers.
In Hibernate, there is an annotation @GenericGenerator creating a non-standard ID generator. Its parameter is either a fully-qualied
name of a class implementing IdentifierGenerator or a shortcut
of one of the many predened generators from the Hibernate distribution, such as increment, identity, sequence, hilo, uuid, guid and
others. Since Hibernate version 3.2.3, the preferred way of generated
identiers is using TableGenerator and SequenceStyleGenerator,
due to better optimization and database portability[9]4 .
OpenJPA also allows creating user-dened identity generators
by implementing Seq interface[10]5 . Besides the standard ones, the
OpenJPA distribution contains few additional generators, most insterestingly TimeSeededSeq generating identiers based on system
time and UUIDHexSeq generating random hex strings.
In EclipseLink, it is possible to create custom generators by extending Sequence and registering the generator class in persistence
unit conguration (persistence.xml ) as follows:
<properties>
...
<property name="eclipselink.session.customizer">
com.example.eclipiselink.CustomIdGenerator
</property>
3.
4.
5.

JavaTM Persistence API, Version 2.0, section 11.1.17


Hibernate developer guide, section 28.4
OpenJPA users guide, section 9.6

14

2. C OMPARISON OF JPA PROVIDERS


...
<properties>

The distribution contains QuerySequence an implementation of


Sequence, which is not only the parent of all the standard sequence
generators contained in the EclipseLink distribution, but also serves
as an generic mechanism for obtaining identiers using user-dened
queries [11].

2.3

Performance

Good performance is indeed a fundamental requirement from any


persistence solution. In this chapter I will benchmark all the frameworks being compared and measure how fast do they perform in
various usage scenarios.
For the purpose of this benchmark, I created a sample JPA application that contains several entities, which are mapped to the database schema from diagram 2.1.

Figure 2.1: ER diagram of sample database schema


The persistence.xml conguration le contains several persistence
units; each one congured with different data source and different
JPA provider. The application also contains a simple testing framework, which is in charge of creating an EntityManagerFactory for a
15

2. C OMPARISON OF JPA PROVIDERS


particular persistence unit, initializing6 and running tests, measuring
execution times and logging the results.
In the benchmark, I will test how the frameworks perform in the
default conguration, without any vendor-specic optimizations. Each
test will be executed three times and the average time will be presented as the test result. All test will run on the following conguration:

Intel Core2 Quad Q9400 2.66GHz

4GB RAM

Xubuntu Linux 11.10, 64-bit

Oracle JDK 7

PostgreSQL 9.1.3

JDBC4 driver, version 9.1-901

To guarantee that tests of all frameworks run under the same conditions7 , I disabled automatic schema generation and created it manually, as shown in listing 2.1.
create sequence sample_sequence start 1000 increment
50;
create table "user" (
user_id bigint primary key,
username varchar(32) not null unique,
enabled boolean not null default true);
create table users_authorities (
user_id bigint references "user",
name varchar(32) not null);
6. Some tests need data already present in the database. Required INSERT statements are, therefore, executed in the initialization phase, the duration of which is
not included in the test result.
7. As seen in appendix A.2, OpenJPA generates some additional indexes, which
would need to be updated with each insert and, therefore, tests execution times
would be affected.

16

2. C OMPARISON OF JPA PROVIDERS

create table discussion (


discussion_id bigint primary key);
create table article (
article_id bigint primary key,
created timestamp default now(),
headline text,
content text,
discussion_id bigint not null unique references
discussion);
create table users_articles (
user_id bigint references "user" (user_id),
article_id bigint references article);
create table tag (
tag_id bigint primary key,
name varchar(32) not null unique);
create table articles_tags (
article_id bigint not null references article,
tag_id bigint not null references tag);
create table post (
post_id bigint primary key,
created timestamp not null default now(),
author varchar(32),
title varchar(32),
text text);
create table discussions_posts (
discussion_id bigint not null references
discussion,
post_id bigint not null unique references post);

Listing 2.1: DDL dening sample database schema


In following sections, I will present results of the performance
benchmarks.
17

2. C OMPARISON OF JPA PROVIDERS


2.3.1 Batch inserts
First executed test is benchmarking batch inserts. I persist 10,000 instances of User entity8 . The results are presented in table 2.1.
1
2
3
Average
Hibernate 467 8ms 4744 ms 4768 ms
4730 ms
OpenJPA 4348 2ms 44779 ms 45813 ms 44691.33 ms
EclipseLink 3799 ms 3832 ms 3825 ms 3818.67 ms
Table 2.1: Batch inserts on PostgreSQL test results
From the values in the table 2.1, we can see that EclipseLink was
the fastest, with Hibernate being the second. OpenJPA, on the other
hand, scored more than 10 times worse. Since such result is very surprising, I decided to run the exact same test on the same machine, but
now with MySQL database9 . The result are presented in table 2.2.
1
2
3
Average
Hibernate 5128 ms
5042 ms 5137 ms 5132.33 ms
OpenJPA 10790 ms 10696 ms 10617 ms 10701 ms
EclipseLink 3377 ms
3257 ms 3387 ms 3340.33 ms
Table 2.2: Batch inserts on MySQL test results
We can see that the difference is not so big anymore, but OpenJPA still performs much worse than the others. There are some performance optimizations available to tune OpenJPA for batch operations, but in the default conguration its performance was by far the
worst of all compared frameworks.
2.3.2 Searching by ID
In this test I make use of the values inserted by previous test. From
10,000 user inserted I fetch by ID random 1,000 of them. Results are
presented in table 2.3.
8.
9.

Users are persisted with an auto-generated name and no articles attached.


The version of MySQL server used in the test was 5.1.61.

18

2. C OMPARISON OF JPA PROVIDERS


1
Hibernate 828 ms
OpenJPA 5285 ms
EclipseLink 691 ms

2
3
863 ms
756 ms
5155 ms 5227 ms
683 ms
666 ms

Average
815.66 ms
5222.33 ms
680 ms

Table 2.3: Find by ID test results


2.3.3 Basic JPA QL test
In this test I fetch all users stored in the database using JPA query.
Table 2.4 presents the results.
1
Hibernate 1002 ms
OpenJPA 1196 ms
EclipseLink 1110 ms

2
3
1138 ms 1013 ms
1086 ms 1078 ms
1133 ms 1256 ms

Average
1051 ms
1120 ms
1166.33 ms

Table 2.4: Fetch all users using JPA QL test results

2.3.4 Basic criteria API test


This test is in its nature similar to the previous one, but I fetch all
users using Criteria API instead of JPA QL. Criteria API is a new
feature introduced in JPA2. Results are presented in table 2.5.
1
Hibernate 1567 ms
OpenJPA 1054 ms
EclipseLink 882 ms

2
3
1530 ms 1523 ms
1094 ms 1136 ms
764 ms
862 ms

Average
1540 ms
1094.66 ms
836 ms

Table 2.5: Fetch all users using criteria API test results
Interesting observation from these results is, that Hibernate performs signicantly slower when using criteria API instead of JPA QL,
whereas EclipseLink, on the other hand, performs slightly faster. Results of OpenJPA are about the same.
19

2. C OMPARISON OF JPA PROVIDERS


2.3.5 Aggregate function
In this test I measure how fast counting entities using both JPA QL
and criteria API is. In the test initialization phase, 1,000 users are inserted into the database, but only half of them have their account set
as enabled. Each user has 15 articles and each article has 3 comments.
In the test itself, I perform complex join over User, Article, Discussion
and Post tables, then selection by Users enabled property and count
returned entities. Results are presented in tables 2.6 and 2.7.
From the results presented in the tables below, it is interesting to
see that the same query is signicantly faster using the criteria API
than using standard JPA query.
1
2
3
Average
Hibernate 306 ms 250 ms 245 ms
267 ms
OpenJPA 354 ms 400 ms 407 ms
387 ms
EclipseLink 306 ms 295 ms 349 ms 316.66 ms
Table 2.6: Complex join using JPA QL test results

1
2
3
Average
Hibernate 127 ms 115 ms 100 ms 114 ms
OpenJPA 176 ms 178 ms 177 ms 177 ms
EclipseLink 104 ms 107 ms 104 ms 105 ms
Table 2.7: Complex join using criteria API test results

2.3.6 Performance summary


From the test results presented, EclipseLink is more or less on par
with Hibernate. Out of seven executed tests, EclipseLink took the
rst place ve times and Hibernate two times. OpenJPA, on the other
hand, performs the worst in all tests except two. Especially in batch
insert test, its performance is far behind the competitors. In other
tests the results were quite similar. Therefore, the poor performance
of applications using JPA it is more likely caused by a poor database
20

2. C OMPARISON OF JPA PROVIDERS


design or incorrect use of the persistence framework (e.g. construction of queries which demand complex join, etc.); not by the framework itself.
It is important to note that all the frameworks are highly congurable and offer various performance enhancements for specic
usage-scenarios. Therefore, it is important not to overestimate the results of this benchmark. It does provide, however, some view on the
performance of compared JPA implementations in default congurations.

2.4

Type conversion

JPA specication does not dene any kind of type conversion. For
example, if in the database there is a string eld which stores boolean
values as Y and N strings, there is no way to map it (directly) to a
Java boolean. All JPA implementations, however, provide extensions
which allow map various database types to Java types and also allow
creating user-dened types.
In Hibernate, there is a Type interface. All the types Hibernate
recognizes implement this interface. So, in Hibernate there are classes
like CalendarType mapping Calendar to a datetime, ClassType,
which maps Java Class objects to varchars, etc. However, for creating new custom types, it is generally not recommended to implement
Type directly because it would make custom type converters tightly
coupled with the Type interface and future changes (such as added
or removed methods) would break all custom type converters [12].
For this reason, there is an interface UserType that should be used
for creating custom type converters, which are later adapted to Type
using CustomType.
OpenJPA also provides support for creating custom mappings.
There is an interface ClassStrategy which can be used for creating
mapping between custom classes and database schema. Such class
strategy can be then congured using @Strategy mapping annotation. For creating various custom eld mappings, OpenJPA provides ValueHandler and FieldStrategy interfaces. The latter is a
bit more complicated to implement, but provides more exibility
21

2. C OMPARISON OF JPA PROVIDERS


when interacting with the database10 .
Since creating custom eld strategies might be in many cases overly
complicated, especially when quite simple value transformation are
needed, OpenJPA provides a mechanism called externalization. Using the @Externalizer annotation, we might specify either an instance method of the mapped class or a static method of any class
which should be invoked to transform a value to its database representation. Its counterpart is a @Factory annotation, which species
how the transformation from the database representation to the custom type looks like. We might pass either nothing, or an instance
method name of the custom type of any static method which does
the conversion. In case that nothing is passed, the constructor of the
custom type is invoked.
In EclipseLink, the primary interface for dening custom converters is Converter. Applications can either create custom implementations of this interface or use some of the predened converters from
the EclipseLink distribution. Some examples of the predened converters are [13]:
ObjectTypeConverter is the simplest converter available. It is used
for custom mapping of database values to Java values when
the formats differ
TypeConversionConverter can be used for explicit mapping of data
source types to Java types
SerializedObjectConverter maps various binary formats into database BLOBs.
When a proper converter is congured, it can be attached to the
mapped attribute using the @Convert annotation.

2.5

Caching support

The JPA2 specication comes with basic support for second-level


cache11 . Second-level cache is the cache at the EntityManagerFactorylevel, so it contains entities from multiple persistence contexts.
10. OpenJPA users guide, section 7.10.3.2
11. JavaTM Persistence API, Version 2.0, section 3.7

22

2. C OMPARISON OF JPA PROVIDERS


Caching can be using shared-cache-mode conguration property
in persistence.xml. Possible values are:

ALL

NONE

ENABLE SELECTIVE

DISABLE SELECTIVE

UNSPECIFIED

There is also the @Cacheable annotation used for specifying entitylevel caching mode in case that either ENABLE SELECTIVE or DISABLE SELECTIVE global caching has been set. In case of UNSPECIFIED cache mode setting, provider-specic rules apply. Since caching
is handled in such minimalistic and rather abstract way, various vendorspecic extensions exist.
The approach recommended in Hibernate the documentation is
delegating the caching functionality to specialized caching tools12 .
Thus, Hibernate neatly integrates with the most popular caching frameworks like EhCache of Hazelcast, simply by setting the conguration property hibernate.cache.region.factory class to an appropriate
cache region factory of the selected caching framework.
OpenJPA comes with its own data cache implementation13 , it can
be turned on using openjpa.DataCache conguration property. OpenJPA also contains transaction-events notication framework14 , which
can be used for cache synchronization between nodes in distributed
environment.
EclipseLink also contains integrated second level cache implementation and does not rely on any third-party framework. Like in
OpenJPA, it also supports caching in clustered environment; using
conguration property eclipselink.cache.coordination.protocol it is
possible to specify which protocol should be used for cache coordination between nodes. Possible options described in the documentation [13] are RMI and JMS.
12. Hibernate developer guide, section 21.2
13. There is a plug-in integrating EhCache with OpenJPA, but since it is not even
mentioned in the ofcial documentation, I will not further discuss it.
14. OpenJPA users guide, section 12.2

23

2. C OMPARISON OF JPA PROVIDERS


To sum up caching solutions used in JPA implementations; Hibernate tries to delegate caching to specialized tools, whereas OpenJPA and EclipseLink integrate second level caching into the core of
the framework. The undeniable advantage of Hibernates approach
is that it that it does not try to reinvent the wheel. Since specialized
caching frameworks are already mature and sophisticated, it is generally a good idea to utilize them as general second-level JPA cache
solutions. Slight disadvantage, however, is that third-party framework is required, which might increase the overall complexity of the
architecture.

2.6

Entity lifecycle and transactional events

JPA specication denes seven entity lifecycle events: pre/post-persist,


pre/post-remove, pre/post-update and post-load; and 2 ways of listening to such events - either by adding appropriate lifecycle annotation on entity method or specifying an EntityManagerFactoryscoped callback listener15 . Even though these callbacks should be ne
in most cases, all vendors provide alternative ways of listening to
events in the persistent layer.
Hibernate has the concept of interceptors16 . Interceptors can be
either Session-scoped or SessionFactory-scoped and provide some
additional callbacks, but with the limitation that there can be only
one per Session/SessionFactory. Hibernate also contains event architecture, which is superior to interceptor capabilities and can be used
to listen to even more ne-grained events raised by the Hibernate
session. All supported events are contained in EventType enum.
OpenJPA support listening to transaction-related events via instances of TransactionListener interface. Such instances are registered at OpenJPAEntityManagerSPI, an OpenJPA-specic extension
of EntityManager. Besides transaction-related events, it is also possible to register LifecycleListener which supports additional callbacks to those from JPA2 standard, such as DetachListener for notication when an entity becomes detached [14]. However, the documentation does not mention this interface at all, so it might not be
15. JavaTM Persistence API, Version 2.0, section 3.5
16. Hibernate developer guide, section 14

24

2. C OMPARISON OF JPA PROVIDERS


intended for public use or might be changed in future releases.
EclipseLink, on the other hand, contains support for events which
occur in the session (EclipseLink-specic implementation of EntityManager). There is an interface SessionEventListener, which can
be registered with a session for getting notication about following
session-related events:

pre/post-transaction commit

pre/post-transaction rollback

pre/post-query execution

the descriptor for an entity being persisted is missing

...

2.7

Schema generation

In order to keep the specication clean and simple17 , JPA does not
force vendors to generate database schema18 . However, the authors
of the specication took schema generation into an account and integrated various metadata mappings, which can be used during schema
generation, such as nullable or unique annotation properties.
All the compared implementations provide a schema generation
functionality. Generated statements can either be sent directly to the
database or saved into a le for manual execution. Also DROP statements can be generated, so during development it is possible to have
all tables removed and generated again on each startup. This ensures that the application always starts in the same state with an
empty database. However, even though automatic schema generation might be convenient in the development phase, it is often a
17. Another reason why the specication does not contain any details of the
schema generation is that it is very vendor-specic issue. The specication would
have to describe how tables generated from entities should look like on particular database platform, which datatypes or constraints should be used, etc. Since
all these elements differ from database to database very much, the specication
avoids it entirely.
18. JavaTM Persistence API, Version 2.0, page 355

25

2. C OMPARISON OF JPA PROVIDERS


source of problems when dealing with incremental schema upgrades.
Therefore, it is generally recommended to avoid it in production environments when possible and rather use database migration tools
like Liquibase 19 , Flyway 20 or MyBatis migrations 21 .
In Hibernate, it is possible to turn on schema generation using
conguration property hibernate.hbm2ddl.auto, which can be set to
one of the following values: validate, update, create and create-drop.
Then, before the EntityManagerFactory is created, Hibernate generates DDL statements and executes them on the database. If this is not
the desired behaviour, there is a SchemaTool class in the Hibernate
distribution, which is responsible for generating the schema and can
be executed manually (either programatically or via command-line)
to have DDL statements generated into a le.
OpenJPA can also be congured to automatically generate the database schema on the application startup by setting conguration
property openjpa.jdbc.SynchronizeMappings to buildSchema. Like
in Hibernate, the schema generation can be triggered manually using MappingTool utility. However, OpenJPA goes even further in
this area and, as the only one from the compared implementations,
also allows generating an object model from an existing database
schema22 . This feature is usually only provided by advanced UML
modelling tools, such as Enterprise Architect from Sparx systems 23 .
In EclipseLink, the schema generation can be turned on using
persistence.xml property eclipselink.ddl-generation. There is also an
option to have DDL statements not only executed directly on the database, but also export them into a le. This is done by setting the
conguration property eclipselink.create-ddl-jdbc-le-name to a target le path and eclipselink.ddl-generation.output-mode to both.
In appendix A, I provide DDL statements of the schema used in
the sample benchmark application from chapter 2.3, automatically
generated by Hibernate, OpenJPA and EclipseLink, respectively.

19.
20.
21.
22.
23.

http://www.liquibase.org/
http://code.google.com/p/flyway/
http://code.google.com/p/mybatis/
OpenJPA users guide, section 7.2
http://www.sparxsystems.com.au/

26

2. C OMPARISON OF JPA PROVIDERS

2.8

Support for stored procedures

A stored procedure is a group of SQL statements that is used to


encapsulate a set of operations or queries to execute on a database
server[15].
JPA standard doesnt mention stored procedures support at all.
However, stored procedures are executed using ordinary SQL, so as
long as the stored procedure returns nothing or a result set (which is
properly mapped to entities using @SqlResultSetMapping on the
JPA side24 ), it can be called using JPA native queries with all JPA
providers. EclipseLink, however, contains some additional, beyondthe-standard features for accessing stored procedures [13].
Lets consider following very simple MySQL stored procedure:
CREATE PROCEDURE calculate_item_count(OUT result INT)
BEGIN
select count(item_id) into result from Item;
END

Listing 2.2: Sample stored procedure


The stored procedure does nothing else than calculating the number of items in table Item and storing it in an output parameter result.
In EclipseLink, it is possible to call stored procedure using StoredProcedureCall as follows:
EntityManager em = emf.createEntityManager();
StoredProcedureCall spc = new StoredProcedureCall();
spc.setProcedureName("calculate_item_count");
spc.addNamedCursorOutputArgument("result");
em.getTransaction().begin();
Session s = em.unwrap(Session.class);
s.executeSelectingCall(spc);
em.getTransaction().commit();

Using this approach, it is not necessary to deal with SQL directly


in stored procedure calls; the procedure is executed by its name only.
Except the StoredProcedureCall, EclipseLink also provides annotationbased approach, similar to the named queries. Using the annota24. JavaTM Persistence API, Version 2.0, section 3.8.15

27

2. C OMPARISON OF JPA PROVIDERS


tion @NamedStoredProcedureQuery, it is possible to dene named
stored procedure call, which can be later executed in similar manner
like standard @NamedQuery or @NamedNativeQuery:
Query query = entityManager.createQuery("queryName");
query.getResultList();

So, having dened the stored procedure from listing 2.2, the named
stored procedure call denition would look as follows:
@NamedStoredProcedureQuery(
name = "getItemCount",
procedureName = "calculate_item_count",
parameters =
@StoredProcedureParameter(queryParameter =
"result", name = "result", direction =
Direction.OUT))

EclipseLink has the best support of stored procedures from all


compared JPA implementations. Using Hibernate, the named native
query denition for stored procedure call would look like this:
@NamedNativeQuery(name = "getItemCountHibernateWay",
query = "? = call calculate_item_count()")

However, this code for procedure calculate item count, as dened


in listing 2.2, would fail because Hibernate does not yet support native scalar queries; the code would lead to NotYetImplementedException.
OpenJPA also claims support of stored procedures25 , but I was
unable to get running any.

2.9

Integrating with other frameworks

Another important decision factor for choosing a JPA implementation is its support for other frameworks which would extend its capabilities even further.
25. It is very briey mentioned in the documentation, without any examples or
further details.

28

2. C OMPARISON OF JPA PROVIDERS


Hibernate is known for an excellent integration with various thirdparty frameworks. Hibernate is not just a persistence framework, it
is sort of an ecosystem with many frameworks built around Hibernate ORM. In chapter 2.5, I was already discussing its integration
with caching frameworks like EhCache or Hazelcast. Two another
important and useful extensions are Hibernate Search and Hibernate
Validator.
Hibernate Validator is a reference implementation of the specication JSR 303: Bean Validation [16]. JSR 303 a standard API for
declarative validations, which JPA already supports for validating
entities upon lifecycle events26 . The conguration is done via property javax.persistence.validation.mode in persistence.xml. The possible values are AUTO, CALLBACK or NONE. Hibernate provides
an additional option DDL.
When the value of this property is set either to AUTO or DLL
and Bean Validation is present in the classpath, the validation metadata are also used in schema generation. When an entity attribute is
decorated with one of the supported annotations from this specication, Hibernate reects such a validation constraint in the generated
schema. For example, if an attribute of type long is annotated with
JSR-303 annotation @Max(30), Hibernate schema generator adds a
database check constraint to ensure that values in the column are
less or equal to 30. There are more supported JSR-303 annotations,
such as @Min, @NotNull, @Size, etc. [17]27
Hibernate Search is a framework built on top of Apache Lucene
aiming to provide full-text search capabilities to the domain model.
It works with both native Hibernate API and Hibernate EntityManager.

2.10 Licenses
When a company is considering implementing their enterprise solution on top of some ORM framework, the character of its license
is very important. Fortunately, all of the compared frameworks are
released under permissive and business-friendly licenses. Hibernate
26. JavaTM Persistence API, Version 2.0, section 3.6
27. Hibernate Validator reference documentation, section 2.4.1

29

2. C OMPARISON OF JPA PROVIDERS


uses LGPL v2.128 OpenJPA uses Apache License v2.029 and EclipseLink is dual licensed under Eclipse Public License - v1.0 and Eclipse
Distribution License v1.030 . Dual license means that users can choose
which license suits their business needs best.

2.11 Documentation quality


During my work on this thesis I was extensively reading the documentation on each particular topic discussed here and, therefore, I
am eligible to provide some comparison of the documentation quality of all the frameworks.
Hibernate has a high-quality documentation. With every new version of the framework, there is a new document describing its features, it is possible to nd documentation of older releases and everything is clearly explained with examples. Since Hibernate is the
most widely used implementation, there is also a lot of examples all
over the internet and users can ask questions in discussion forums31 .
OpenJPA also has high-quality documentation, everything is possible to nd there. However, it is unfortunately not as example-rich
as in Hibernate; it is purely a reference guide. As every Apache project,
OpenJPA provides mailing lists for users questions.
For EclipseLink, there is quite a lot of documentation, but its problem is the fragmentation. Indeed, it is possible to nd answers to
most of the questions, but it took me signicant amount of time compared to Hibernate or OpenJPA. Moreover, since the documentation
is not versioned, it is sometimes fairly difcult to distinguish which
functionality is new to EclipseLink implementation of JPA2 and which
was already present in EclipseLink JPA1. EclipseLink also has its
support forums32 .

28.
29.
30.
31.
32.

http://www.hibernate.org/license
http://openjpa.apache.org/license.html
http://wiki.eclipse.org/EclipseLink/FAQ/General
http://forums.hibernate.org
http://www.eclipse.org/forums/index.php/f/111

30

2. C OMPARISON OF JPA PROVIDERS

2.12 Build systems


Nowadays, Maven can be considered as the industry de-facto standard build and project management tool. EclipseLink and OpenJPA
both use it for their builds. Hibernate in version 4.0, however, has
switched to Gradle, which is a build tool written in Groovy. Hibernate creators switched to Gradle hoping to simplify the build process
and get rid of other various build-related problems 33 .

2.13 Summary
In this section I discussed various vendor-specic extension of the
JPA standard. I summarize this chapter in following brief featurematrix:
Custom identier generation
Performance in the default
conguration
Build-in event notication
framework
Support for custom types
Schema generation
Caching
Nonstandard stored procedures support
Third-party frameworks
support
Documentation quality
Business-friendly license

Hibernate
Yes

OpenJPA EclipseLink
Yes
Yes

2.

3.

1.

Powerful

Yes

Yes

Yes
Yes
Third-party
No

Yes
Powerful
Custom
No

Yes
Yes
Custom
Yes

Yes

No

No

High
Yes

High
Yes

Moderate
Yes

Table 2.8: Feature matrix


It is important to note that the list of features presented in this
chapter is by no means exhaustive; I just described important fea33. Deeper explanation of the reasons behind the transition to Gradle can be found
at https://community.jboss.org/wiki/GradleWhy.

31

2. C OMPARISON OF JPA PROVIDERS


tures which are most likely to be used in real-world applications
using JPA. All implementations contain additional, more or less cornercase features or performance enhancements, the description of which
would be beyond the scope of this thesis.

32

3 Experimental migration of JPA applications


In this chapter, I will take sample applications of all compared implementations, migrate them to the other two and test it on PostgreSQL,
MySQL and Oracle. I will try to use standard features where possible and fall back to vendor-specic extensions only when necessary.
Features, which are neither possible to migrate to the standard API,
nor have they counterparts in the other frameworks, will be removed
entirely.
All migrated projects were tested on MySQL 5.5.22, PostgreSQL
9.1.3 and Oracle XE 11.2. Since all the projects use automatic schema
generation, there were not real issues with the database portability;
the only exception was identier generation. Both PostgreSQL and
Oracle support sequences, so I could use sequence generation strategy there. In MySQL, however, due to the missing support of sequences, I had to switch to the table generation strategy1 .

3.1

Migrating from Hibernate

Since Hibernate does not ship with any ofcial reference application, I searched at SourceForge 2 for some projects using Hibernate,
which would be suitable candidates for experimental migration. In
the end, I decided for open-forum 3 because it is already managed by
Maven and has convenient and easily understandable project structure. Open-forum describes itself as an opensource forum engine
written in Java. The project is not yet nished, quite a lot of functionality is still not implemented, but as an example project for experimental migration it is ne.
Open-forum uses Spring framework, JSF2 and Hibernate. Interesting fact is that it does not use JPA at all, but rather relies on the
native Hibernate API. Therefore, it will be necessary to migrate complete persistence logic to the standard API. The project also depends
1. I might have chosen AUTO generation strategy, which automatically determines the right one for the particular database, but since I prefer to have control
over what is going to be generated, I congured the generation strategy manually.
2. http://www.sourceforge.net/
3. http://sourceforge.net/projects/open-forum/

33

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS


on Hibernate Search, but the searching functionalities are not yet implemented. If they were, however, it wouldnt be possible to migrate
the project at all, because Hibernate Search does not work with any
other JPA providers except Hibernate, as I already discussed in section 2.9.
The migration of this project consists of following steps:

Entirely remove Hibernate dependency from the Maven conguration le.

Modify Spring conguration les to bootstrap the JPA entity


manager factory instead of the Hibernate session factory. This
is done via LocalContainerEntityManagerFactoryBean bean
dened in the Spring application context.

Change Hibernate-specic entity annotations. Since the project


was already using many standard annotations from package
javax.persistence where possible, it was only necessary to
migrate the @LazyCollection annotation, which I replaced
with the fetch attribute of corresponding @ManyToMany annotations.

Replace classes from the native Hibernate API, such as Query


or Session, with its standard counterparts.

Rewrite HQL queries because their syntax in some aspects differs from the JPA QL.

Create persistence.xml conguration le and properly dene


a new persistence unit and its properties.

Migration of this project was went without any serious complications. It was even not necessary to use any vendor-specic extensions
(with the exception of different conguration property for schema
generation).
34

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS

3.2

Migrating from OpenJPA

Unlike Hibernate, for OpenJPA there are two sample applications


available at its website4 OpenBooks and OpenTrader. I rst tried
to migrate OpenTrader, which is a GWT web application. Unfortunately, I was unable to get running even its original OpenJPA-based
version, due to some GWT-related errors. Therefore, I will migrate
OpenBooks in this section.
Openbooks uses Maven for dependency management. Therefore,
the rst step of the migration process was removing all OpenJPArelated dependencies and adding Hibernate EntityManager. After
I imported such project into the IDE and tried to compile it, I received more than a hundred of compilation errors. OpenBooks unfortunately heavily relies on the native OpenJPA API.
The next task was migrating JPA conguration in persistence.xml.
I changed the provider of the persistence unit to Hibernate, removed
OpenJPA-specic conguration options and replace them with their
Hibernate counterparts when possible. The conguration options not
applicable in Hibernate, such as automatic enhancement of persistent classes, were removed entirely.
Some errors were related to missing metamodel classes required
for the type-safe criteria API5 a new feature introduced in JPA2. The
original OpenJPA-based project used an ANT task to generate such
classes, so in its migrated version, I also had to generate such metamodel classes. There is a subproject of Hibernate called Hibernate
Metamodel Generator 6 which provides an annotation processor exactly for this purpose. Metamodel generator can be run either as an
ANT task, Maven plugin or simply from the command line. Since
my IDE supports triggering annotation processors on compile-time,
I used the last option and thus didnt have to execute any ANT tasks
or congure additional Maven plugins.
Another group of compilation errors was complaining about missing OpenJPA-specic classes on the classpath. Since the number of
missing classes was very large and the application API was heavily
4.
5.
6.

http://openjpa.apache.org/samples.html
JavaTM Persistence API, Version 2.0, section 6.4
http://www.hibernate.org/subprojects/jpamodelgen.html

35

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS


relying on them, I came to the conclusion that Openbooks application cannot be migrated to Hibernate (or any other JPA provider)
without rewriting the major part of the code and reconsidering its
design concepts. It is simply not possible to replace missing OpenJPA classes with their alternatives from Hibernate, which would do
more or less the same thing. OpenJPA-specic classes are distributed
throughout the application code, classes using them depend on another classes using them and removing a specic class or a method
from the application results in a large chain of broken dependencies.
The lesson learned from the migration of Openbooks to Hibernate can be useful when designing new applications based on JPA. If
there is even a slightest chance that it might be necessary to migrate
the project to a different JPA provider in the future, the developers
should try hard to stick with the standard API and avoid vendorspecic extensions when possible. The migration process will likely
be painful and costly, because the native APIs of different JPA providers
do not work as simple replacements of each other. Each JPA provider
comes with some concepts of its native API, which are not possible
to simply emulate in a different JPA provider.

3.3

Migrating from EclipseLink

As a sample application of migration from EclipseLink, I will use


jpa.employee sample, which can be found at EclipseLink SVN repository7 . It is a project for focused on management of employees and
projects and is using JSF as frontend technology and EJB for the service tier. Its domain model is not very complicated but perfectly ne
to show interesting JPA and EclipseLink features.
In order to work with the application (and later with its migrated
copies) in a convenient way, I had rst to polish it a bit (add Maven
for builds and dependency management, refactor project structure,
etc.). However, I was not able to successfully start the application
on various application servers8 until I installed JBoss AS7, where the
deployment process was entirely painless.
7. http://dev.eclipse.org/svnroot/rt/org.eclipse.
persistence/trunk/examples
8. I tried to run it on Glasssh and Apache TomEE

36

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS


Since the project is using container-managed datasource, there
was no need to congure it in the application; I just congured the
datasource in the JBoss administration console9 .
During the migration, I had to rewrite or change the following
parts of the code:

In this sample application, there was used EclipseLink-specic


query hint QueryHints.FETCH, which congures the query
to optimize the retrieval of the related objects, the related objects will be joined into the query instead of being queried independently [11]. Since there is no direct alternative in both
Hibernate and OpenJPA, I removed this optimization hint entirely.

There was a very EclipseLink-specic way of capturing logged


SQL statements (using Java reection). I removed this code entirely and introduced a cleaner way of SQL statements logging
using org.hibernate.LOG logger. In OpenJPA, there is a conguration property openjpa.Log, which is used for SQL statements logging.

I had to remove eclipselink-orm.xml conguration le, which


is also EclipseLink-specic. It contained various named query
denitions (which I declared using annotations on entity-level),
some additional persistence unit conguration and, most importantly, entity mapping metadata, which I added into entities using the standard annotation-based approach.

The application was saving gender eld of employees in format M and F for males and females, respectively. It was using Eclipse-Link specic object-type-converter to convert such
strings stored in the database into instances of Gender on the
Java side. Since such conversion is not possible in pure JPA, I
had to use vendor specic extensions of Hibernate and OpenJPA.

9. I nd such conguration very convenient, because it is no longer necessary to


bundle database driver with the application; the application just expects that the
connection is there.

37

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS

3.4

In Hibernate, I extended UserType class to create custom


type converter, which is responsible for a serialization of
the Java enum into its database VARCHAR representation and back.
In OpenJPA, I annotated the gender eld with @Externalizer("toString")
and @Factorty, as described in section 2.4. Since for @Factory,
I did not specify any additional parameter, the constructor will be called for obtaining new Gender instance.

For OpenJPA, I enabled unenhanced persistent classes on runtime using conguration property openjpa.RuntimeUnenhancedClasses.
Even though the classes should be enhanced automatically
during the deployment on the JBoss application server, I was
receiving an error that they are not. Even though I might have
run compile-time enhancement using Ant or Maven (like I
did in the performance benchmark), I decided to simply allow
classes to be unenhanced in this case.

Migration summary

In this section, I migrated various applications between different JPA


implementations. Two out of three migrations were successful, one
was not. Surprisingly, the unsuccessful one lead to probably the most
interesting conclusion, because it shows that migrating an application between JPA providers is not easy if it heavily utilizes the native
API.

38

4 Automatic migration tool


Practical part of the thesis is an application providing a support with
migrating existing projects from OpenJPA and EclipseLink to Hibernate, I call it JPA Migration and it is written in Scala.

4.1

The application architecture

The architecture of the application was designed with emphasis on


its extensibility. Migration of a project is divided into migrating individual source les using so called processors. They are responsible for locating various vendor specic features and displaying hints
how could such features be migrated to Hibernate1 . If there is a direct
counterpart in Hibernate, the feature is migrated automatically2 .
For both supported JPA implementations (OpenJPA, Hibernate),
there are several processors congured in the application conguration. All parsers mix in3 the trait Processor, which denes two
abstract methods:

appliesFor(filename: String) returns a boolean value in-

dicating whether the processor should be executed for a le


with a particular lename.

parse(source: String) receives a le content as a function

argument, analyses it, migrates the constructs it is responsible


for, and returns a tuple, where the rst element is a new migrated source4 and the second argument is a list of instances
of MigrationTaskResult, which provides a description of a
single step performed during the migration.
1. If it is possible to migrate a specic vendor-specic feature directly to the standard API, it will be preferred over migrating to Hibernate.
2. Currently, there are automatically migrated only standardized JPA persistence.xml properties and some query hints whose semantics is the same in multiple frameworks.
3. Mix in a Scala train is kind of an alternative concept to implementing an interface in Java.
4. When no migration is done, the returned source is the same as the input.

39

4. A UTOMATIC MIGRATION TOOL


There are two additional abstract classes PropertiesXmlProcessor
and ClassProcessor, which both mix in Processor and serve as
base classes of all processors working with Java sources les or XML,
respectively. They provide various convenience methods for all the
processors of the same kind. On gure 4.1 there is the class diagram
describing all the classes participating in the migration process.
The migration process itself consists of following steps:
1.

The application starts. It receives four command-line arguments:

the project directory

an output directory

the location of a le where migration the report should


be written to

the name of the current JPA provider (eclipselink or openjpa)

2.

The arguments are validated (e.g. if the project directory exists, if the output directory is writeable, some additional access right are checked, etc.)

3.

The migration itself begins. The input directory is as a whole


copied to the output destination.

4.

The list of available processors for the particular JPA implementation is loaded from the application conguration.

5.

The output directory is walked recursively and for each le,


all processors whose method appliesFor() returns true, are
subsequently executed5 ; each consecutive processor receives
the output of the previous processor as an input argument.
Implemented this way, the source code is successively migrated
and the lists of MigrationTaskResult instances (the second
element of a tuple returned by the parse() method) are concatenated, forming a complete migration report for a particular le, represented by an instance of class Report.

5.

Their method parse() is executed.

40

4. A UTOMATIC MIGRATION TOOL


6.

Migrated source code is written back to the source le.

7.

Instances of Report created during the migration of individual source les are collected and passed as an argument to the
templating engine Scalate, which is responsible for creating
HTML report using the predened template.

4.2

Java source les parsing

While the parsing of XML conguration les is rather straightforward and only involves quite simple DOM tree manipulation, parsing Java source les and their analysis is much more complicated. It
is necessary to obtain an abstract representation of a Java source le
an abstract syntax tree (AST). Such AST is then recursively searched
for occurrences of vendor-specic constructs, such as types, annotations or imports. For creating AST representations of Java source
les, I use the library javaparser 6 .
In listing 4.1, I demonstrate how Scala pattern matching [18] is
used to recursively browse an AST and search for a specic annotation. The method receives three arguments an instance of CompilationUnit
which is an abstract representation of a single source le; an annotation to search for and a closure to call for all annotations found. The
method recursively searches all types (classes or interfaces) and their
members (elds, constructors, methods and inner classes) if they are
declared with a specic annotation. If such annotation is found, the
closure is called, passing the found annotation as an argument. Such
solution is very exible the client just passes in an arbitrary closure
which is guaranteed to be called for all matching annotations found.
It is a responsibility of the caller7 to do something useful with it.
Since the very same approach is used to search for occurrences
of vendor-specic types in an AST, pattern matching and recursive
functions are of a great use here again. I browse the AST from the

6.
7.

http://code.google.com/p/javaparser/
The callers in the current architecture are the processors.

41

4. A UTOMATIC MIGRATION TOOL


top8 , pattern match on the class of the current statement and recursively unwrap simpler statements out of the more complex ones.
For example, when parsing an if statement, I recursively parse its
condition, if block and else block. When a vendor-specic type is
encountered at any point, the closure passed as an argument to the
parsing function is called, passing it the found type as an argument.
def processAnnotations(unit: CompilationUnit,
annotation: String,
closure: (AnnotationExpr => Unit)) {
if (unit.getTypes != null)
unit.getTypes.foreach(handleMember)
def handleMember(element: BodyDeclaration) {
element match {
case x: ClassOrInterfaceDeclaration => {
handleAllowedType()
x.getMembers.foreach(handleMember)
}
case _: FieldDeclaration |
_: ConstructorDeclaration |
_: MethodDeclaration => handleAllowedType()
case _ =>
}
def handleAllowedType() {
if (element.getAnnotations != null)
element.getAnnotations.foreach(x => if
(x.getName.getName == annotation)
closure(x))
}
}
}

Listing 4.1: Recursively searching the abstract syntax tree for vendor8. It means that the parsing starts with the whole source le, then recurses into all
the types it contains, then to all their elds, methods and constructors which contain various language constructs, such as ifs, loops, anonymous classes, returns,
etc., down to atomic statements like string literals or null expressions.

42

4. A UTOMATIC MIGRATION TOOL


specic annotations using Scala pattern matching

4.3

Ideas for a further development

The application is written in a clean functional style, with emphasis on its extensibility and code clarity. Migration of additional constructs is just a matter of writing a processor (which is very simple
if one of the abstract classes providing support for manipulating the
DOM or browsing an AST is extended) and plugging it into the application by simply conguring the class name of the processor in
the cong.properties conguration le. The current implementation
contains support for migrating almost all the features described in
chapter 2. However, as I noted earlier, both JPA implementations
contain additional features, which were not discussed and the parsing of which is currently not implemented. Therefore, a future development might focus on parsing and migrating additional vendorspecic features, which are not handled in the current implementation.
Additional problem is that the parser used for creating AST support only Java up to version 5. Therefore, parsing of Java source les
which contain new language constructs introduced in Java 6 or Java
7 (such as the diamond operator or try block with resources), leads to
parsing errors. Unfortunately, this is not easy to x because it would
require the change of the parser and very likely also most of the processing logic.

43

4. A UTOMATIC MIGRATION TOOL

Figure 4.1: Class diagram of the migration application


44

5 Conclusion
In the theoretical part of the thesis I was comparing the most popular JPA implementations according to various criteria I selected. An
expected output from any comparison work is a ranking indicating
who the winners and the losers are. Unfortunately, create such ranking is not possible in case of JPA implementations. All implementation have their strengths and weaknesses and it heavily depends
on the needs of the implementer. Hibernate would be the choice for
its large community, high-quality documentation, integration with
many other frameworks and the support provided by JBoss. EclipseLink, however, is the reference implementation of the JPA standard
and its biggest advantage is its great support for stored procedures.
And nally OpenJPA contains the most powerful features in the area
of a schema generation and generation of persistence classes out of
existing database schema. However, as I discovered in chapter 3, if
a project is heavily utilizing native features of any JPA implementation, it is necessary to remember that the migration process from
one provider to another will be difcult and will require signicant
changes in the code base.
Practical part involved creating an application providing support
for migrating a project from one JPA provider to another. The application is developed in the Scala programming language and is called
JPA migration. It goes through all the sources contained in the source
directory, searches for vendor-specic features and tries to migrate
them automatically either to the standard API or to Hibernate if there
is not any standard alternative. If it is not possible to it automatically,
it displays a hint how could the particular vendor-specic feature be
migrated manually, as well as a link to the corresponding chapter in
the documentation.

45

A Generated database schemas


A.1 Hibernate

create table "user" (


user_id int8 not null,
enabled boolean,
username varchar(255) not null unique,
primary key (user_id)
)
create table article (
article_id int8 not null,
content varchar(255) not null,
created timestamp,
headline varchar(255) not null,
discussion_id int8 not null,
primary key (article_id)
)
create table articles_tags (
article_id int8 not null,
tag_id int8 not null,
primary key (article_id, tag_id)
)
create table discussion (
discussion_id int8 not null,
primary key (discussion_id)
)
create table discussions_posts (
discussion_id int8 not null,
post_id int8 not null unique,
unique (post_id)
)
create table post (

46

A. G ENERATED DATABASE SCHEMAS


post_id int8 not null,
author varchar(255) not null,
created timestamp,
text varchar(255) not null,
title varchar(255) not null,
primary key (post_id)
)
create table tag (
tag_id int8 not null,
name varchar(255) not null unique,
primary key (tag_id)
)
create table users_articles (
user_id int8 not null,
article_id int8 not null,
primary key (user_id, article_id)
)
create table users_authorities (
user_id int8 not null,
name varchar(255) not null,
primary key (user_id, name)
)
alter table article
add constraint FKD458CCF6386589CA
foreign key (discussion_id)
references discussion
alter table articles_tags
add constraint FK487AF8DBDF6FF6EA
foreign key (article_id)
references article
alter table articles_tags
add constraint FK487AF8DBBBDB3C6A
foreign key (tag_id)
references tag

47

A. G ENERATED DATABASE SCHEMAS


alter table discussions_posts
add constraint FKA16447DF386589CA
foreign key (discussion_id)
references discussion
alter table discussions_posts
add constraint FKA16447DF48822CA
foreign key (post_id)
references post
alter table users_articles
add constraint FK2F4ABE94DF6FF6EA
foreign key (article_id)
references article
alter table users_articles
add constraint FK2F4ABE941316CEEA
foreign key (user_id)
references "user"
alter table users_authorities
add constraint FK6555336A1316CEEA
foreign key (user_id)
references "user"
create sequence sample_sequence

Listing A.1: Hibernate-generated sample database schema

A.2 OpenJPA
CREATE SEQUENCE sample_sequence START WITH 1 CACHE 50;
CREATE TABLE "user" (user_id BIGINT NOT NULL, enabled
BOOL, username VARCHAR(255) NOT NULL, PRIMARY KEY
(user_id), CONSTRAINT U_USER_USERNAME UNIQUE
(username));

48

A. G ENERATED DATABASE SCHEMAS


CREATE TABLE article (article_id BIGINT NOT NULL,
content VARCHAR(255) NOT NULL, created TIMESTAMP,
headline VARCHAR(255) NOT NULL, discussion_id
BIGINT NOT NULL, PRIMARY KEY (article_id));
CREATE TABLE articles_tags (article_id BIGINT, tag_id
BIGINT);
CREATE TABLE discussion (discussion_id BIGINT NOT
NULL, PRIMARY KEY (discussion_id));
CREATE TABLE discussions_posts (discussion_id BIGINT
NOT NULL, post_id BIGINT NOT NULL, CONSTRAINT
U_DSCSSTS_POST_ID UNIQUE (post_id));
CREATE TABLE post (post_id BIGINT NOT NULL, author
VARCHAR(255) NOT NULL, created TIMESTAMP, text
VARCHAR(255) NOT NULL, title VARCHAR(255) NOT
NULL, PRIMARY KEY (post_id));
CREATE TABLE tag (tag_id BIGINT NOT NULL, name
VARCHAR(255) NOT NULL, PRIMARY KEY (tag_id),
CONSTRAINT U_TAG_NAME UNIQUE (name));
CREATE TABLE users_articles (user_id BIGINT,
article_id BIGINT);
CREATE TABLE users_authorities (user_id BIGINT, name
VARCHAR(255) NOT NULL);
CREATE INDEX I_ARTICLE_DISCUSSION ON article
(discussion_id);
CREATE INDEX I_RTCLTGS_ARTICLE_ID ON articles_tags
(article_id);
CREATE INDEX I_RTCLTGS_ELEMENT ON articles_tags
(tag_id);
CREATE INDEX I_DSCSSTS_DISCUSSION_ID ON
discussions_posts (discussion_id);
CREATE INDEX I_DSCSSTS_ELEMENT ON discussions_posts
(post_id);
CREATE INDEX I_SRS_CLS_ELEMENT ON users_articles
(article_id);
CREATE INDEX I_SRS_CLS_USER_ID ON users_articles
(user_id);
CREATE INDEX I_SRS_RTS_USER_ID ON users_authorities
(user_id);

Listing A.2: OpenJPA-generated sample database schema


49

A. G ENERATED DATABASE SCHEMAS

A.3 EclipseLink

CREATE TABLE article (article_id BIGINT NOT NULL,


content VARCHAR(255) NOT NULL, created TIMESTAMP,
headline VARCHAR(255) NOT NULL, discussion_id
BIGINT NOT NULL, PRIMARY KEY (article_id))
CREATE TABLE discussion (discussion_id BIGINT NOT
NULL, PRIMARY KEY (discussion_id))
CREATE TABLE post (post_id BIGINT NOT NULL, author
VARCHAR(255) NOT NULL, created TIMESTAMP, text
VARCHAR(255) NOT NULL, title VARCHAR(255) NOT
NULL, PRIMARY KEY (post_id))
CREATE TABLE tag (tag_id BIGINT NOT NULL, name
VARCHAR(255) NOT NULL UNIQUE, PRIMARY KEY (tag_id))
CREATE TABLE "user" (user_id BIGINT NOT NULL, enabled
BOOLEAN, username VARCHAR(255) NOT NULL UNIQUE,
PRIMARY KEY (user_id))
CREATE TABLE users_articles (article_id BIGINT NOT
NULL, user_id BIGINT NOT NULL, PRIMARY KEY
(article_id, user_id))
CREATE TABLE articles_tags (article_id BIGINT NOT
NULL, tag_id BIGINT NOT NULL, PRIMARY KEY
(article_id, tag_id))
CREATE TABLE discussions_posts (discussion_id BIGINT
NOT NULL, post_id BIGINT NOT NULL UNIQUE, PRIMARY
KEY (discussion_id, post_id))
CREATE TABLE users_authorities (user_id BIGINT, name
VARCHAR(255) NOT NULL)
ALTER TABLE article ADD CONSTRAINT
FK_article_discussion_id FOREIGN KEY
(discussion_id) REFERENCES discussion
(discussion_id)
ALTER TABLE users_articles ADD CONSTRAINT
FK_users_articles_article_id FOREIGN KEY
(article_id) REFERENCES article (article_id)
ALTER TABLE users_articles ADD CONSTRAINT
FK_users_articles_user_id FOREIGN KEY (user_id)
REFERENCES "user" (user_id)

50

A. G ENERATED DATABASE SCHEMAS


ALTER TABLE articles_tags ADD CONSTRAINT
FK_articles_tags_tag_id FOREIGN KEY (tag_id)
REFERENCES tag (tag_id)
ALTER TABLE articles_tags ADD CONSTRAINT
FK_articles_tags_article_id FOREIGN KEY
(article_id) REFERENCES article (article_id)
ALTER TABLE discussions_posts ADD CONSTRAINT
FK_discussions_posts_post_id FOREIGN KEY (post_id)
REFERENCES post (post_id)
ALTER TABLE discussions_posts ADD CONSTRAINT
FK_discussions_posts_discussion_id FOREIGN KEY
(discussion_id) REFERENCES discussion
(discussion_id)
ALTER TABLE users_authorities ADD CONSTRAINT
FK_users_authorities_user_id FOREIGN KEY (user_id)
REFERENCES "user" (user_id)
CREATE SEQUENCE sample_sequence INCREMENT BY 50 START
WITH 50

Listing A.3: EclipseLink-generated sample database schema

51

Bibliography
[1] R. Ramakrishnan and J. Gehrke, Database management systems. McGraw-Hill international editions: Computer science
series, McGraw-Hill, 2003.
[2] A. Silberschatz, H. Korth, and S. Sudarshan, Database System
Concepts. McGraw-Hill, 2010.
[3] R. Stephens, Beginning Database Design Solutions. John Wiley
& Sons, 2010.
[4] N. H. Bercich, The evolution of the computerized database,
ARXIV, 2003.
[5] E. F. Codd, A relational model of data for large shared data
banks, Commun. ACM, vol. 13, pp. 377387, June 1970.
[6] C. Bauer and G. King, Java persistence with Hibernate. Manning Pubs Co Series, Manning, 2007.
[7] M. Keith and M. Schincariol, Pro JPA 2: mastering the Java Persistence API. Apress Series, Apress, 2009.
[8] JSR 317: Java Persistence API, Version 2.0. http://jcp.
org/en/jsr/detail?id=317, Dec. 2009.
[9] Hibernate developer guide. http://docs.jboss.org/
hibernate/core/4.0/devguide/en-US/html/.
[10] OpenJPA users guide. http://openjpa.apache.org/
builds/2.1.1/apache-openjpa/docs/manual.html.
[11] EclipseLink 2.3 API documentation. http://www.
eclipse.org/eclipselink/api/2.3/index.html.
[12] Hibernate ORM 4.0 API documentation. http://docs.
jboss.org/hibernate/orm/4.0/javadocs/.
[13] EclipseLink project wiki. http://wiki.eclipse.org/
Category:EclipseLink/Documentation/JPA.
52

A. G ENERATED DATABASE SCHEMAS


[14] OpenJPA 2.2.0 API documentation. http://openjpa.
apache.org/builds/2.2.0/apidocs/index.html.
[15] M. Fisher, J. Ellis, and J. Bruce, Jdbc Api Tutorial and Reference.
Java Series, Addison-Wesley, 2003.
[16] JSR 303 : Bean Validation. http://jcp.org/en/jsr/
detail?id=317, Nov. 2009.
[17] Hibernate Validator reference documentation. http:
//docs.jboss.org/hibernate/validator/4.2/
reference/en-US/html/.
[18] M. Odersky, L. Spoon, and B. Venners, Programming in Scala.
Artima Series, Artima Press, 2011.

53

You might also like