M ASARYKOVA UNIVERZITA FAKULTA INFORMATIKY

}w¡¢£¤¥¦§¨!"#$%&123456789@ACDEFGHIPQRS`ye|
D IPLOMA THESIS

Comparison of JPA providers and issues with migration

Luk´ s Sembera aˇ ˇ

Brno, June 2012

Declaration
Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Luk´ s Sembera aˇ ˇ

Advisor: Jiˇ´ Pechanec, Red Hat Czech, s.r.o. rı ii

Acknowledgement
I would like to thank my technical advisor Jiˇ´ Pechanec from Red rı Hat Czech for his valuable comments and suggestions. I would also like to thank my fianc´ e Daria for her support during writing. e

iii

Abstract
This thesis aims to compare three implementations of the JPA standard – specifically Hibernate, OpenJPA and EclipseLink. Except the comparison, it will also describe the migration processes of various real-world applications between those JPA implementation and document the issues that the developers might typically run into. The practical part involves developing an application which would provide a support when migrating projects between those three JPA providers.

iv

Keywords
JPA, JPA2, Hibernate, OpenJPA, EclipseLink, Java, persistence, relational, databases, Scala

v

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . 1.1 Database management systems . . . . . . 1.1.1 Relational databases . . . . . . . . 1.1.2 Object-oriented databases . . . . . 1.1.3 NoSQL databases . . . . . . . . . . 1.2 Object-relational mismatch . . . . . . . . . 1.3 Brief history of Java persistence solutions 1.3.1 JDBC . . . . . . . . . . . . . . . . . 1.3.2 EJB 2.x entity beans . . . . . . . . . 1.3.3 JDO . . . . . . . . . . . . . . . . . . 1.3.4 myBatis . . . . . . . . . . . . . . . 1.4 JPA . . . . . . . . . . . . . . . . . . . . . . 1.5 Goals of the thesis . . . . . . . . . . . . . . Comparison of JPA providers . . . . . . . . . . 2.1 Methodology of the comparison . . . . . 2.2 Identifier generation . . . . . . . . . . . . 2.3 Performance . . . . . . . . . . . . . . . . . 2.3.1 Batch inserts . . . . . . . . . . . . . 2.3.2 Searching by ID . . . . . . . . . . . 2.3.3 Basic JPA QL test . . . . . . . . . . 2.3.4 Basic criteria API test . . . . . . . . 2.3.5 Aggregate function . . . . . . . . . 2.3.6 Performance summary . . . . . . . 2.4 Type conversion . . . . . . . . . . . . . . . 2.5 Caching support . . . . . . . . . . . . . . . 2.6 Entity lifecycle and transactional events . 2.7 Schema generation . . . . . . . . . . . . . 2.8 Support for stored procedures . . . . . . . 2.9 Integrating with other frameworks . . . . 2.10 Licenses . . . . . . . . . . . . . . . . . . . 2.11 Documentation quality . . . . . . . . . . . 2.12 Build systems . . . . . . . . . . . . . . . . 2.13 Summary . . . . . . . . . . . . . . . . . . . Experimental migration of JPA applications . 3.1 Migrating from Hibernate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 7 8 8 9 9 10 11 11 12 12 13 13 14 15 18 18 19 19 20 20 21 22 24 25 27 28 29 30 31 31 33 33 1

2

3

3.2 Migrating from OpenJPA . . . . 3.3 Migrating from EclipseLink . . 3.4 Migration summary . . . . . . . 4 Automatic migration tool . . . . . . 4.1 The application architecture . . 4.2 Java source files parsing . . . . 4.3 Ideas for a further development 5 Conclusion . . . . . . . . . . . . . . . A Generated database schemas . . . . A.1 Hibernate . . . . . . . . . . . . . A.2 OpenJPA . . . . . . . . . . . . . A.3 EclipseLink . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

35 36 38 39 39 41 43 45 46 46 48 50

2

Listings
1.1 2.1 2.2 4.1 Sample of JDBC code . . . . . . . . . . . . . . . . . . . . DDL defining sample database schema . . . . . . . . . Sample stored procedure . . . . . . . . . . . . . . . . . . Recursively searching the abstract syntax tree for vendorspecific annotations using Scala pattern matching . . . A.1 Hibernate-generated sample database schema . . . . . A.2 OpenJPA-generated sample database schema . . . . . . A.3 EclipseLink-generated sample database schema . . . . 9 16 27 42 46 48 50

3

List of Figures
2.1 4.1 ER diagram of sample database schema 15 44 Class diagram of the migration application

4

List of Tables
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Batch inserts on PostgreSQL test results 18 Batch inserts on MySQL test results 18 Find by ID test results 19 Fetch all users using JPA QL test results 19 Fetch all users using criteria API test results 19 Complex join using JPA QL test results 20 Complex join using criteria API test results 20 Feature matrix 31

5

1 Introduction
Every application, except the most basic ones, has to deal with data. The very first computers were designed as black boxes receiving input, doing some calculations and producing output. Since then, computers have become much more complicated and nowadays they do much more than such simple data processing. Nevertheless, they still operate with data stored on some kind of a permanent storage device, such as hard drive. Input data for an application could be saved, without much thinking, into an ordinary text file. However, such files are next to impossible to machine process because they do not follow any rules which would describe their structure. For this reason, variety of rules the data have to follow are often introduced (e.g. the structure is described by XML with an appropriate XML Schema definition). Even if the data are in easily computer-readable form, the biggest problem with this “file-based” approach remains. It is still just a text file and, therefore, the data access is limited by I/O operations of the operating system. Demands of current enterprise applications, however, go far beyond the possibilities of such file-based persistence. We require reliability, transaction management, high-performance concurrent access, advanced user access control and much more. To support all of these advanced features, database management systems have been invented.

1.1

Database management systems

A database management system (DBMS), as defined in [1], is a software designed to assist in maintaining and utilizing large collections of data. Each DBMS has its model, which describes data, data relationships, semantics and consistency constraints[2]. It is basically a theoretical foundation, upon which database management systems operate. During last few decades, several database models have been invented. In 1960, IBM introduced their database management system IMS, which internally uses hierarchical database model. Hierarchical model stores data in records, which are connected with each other 6

1. I NTRODUCTION through links, creating tree-like structures [2]. An evolution of the hierarchical model is the network model, which allows records to be connected in arbitrary graphs and thus making data modelling mode flexible (e.g. allows many-to-many relationships between records). Even though hierarchical and network databases exist and are still in use1 , the models have many flaws (further discussed in [4]), which make their usage in certain scenarios particularly complicated. 1.1.1 Relational databases In 1970, E. F. Codd published a revolutionary paper [5], where he laid out the concept of the relational data model, which is the theoretical foundation of relational databases. For its flexibility2 , simplicity and strong but simple formal background (which allows mathematical reasoning about data) its popularity grew rapidly. A lot of both commercial and open-source implementations exist; they are very mature and industry-proven, relational model itself is very well understood and documented. For these reasons, relational databases basically mean an industry standard and their knowledge is essential for every programmer. 1.1.2 Object-oriented databases In last decade, under the influence of object oriented programming, the concept of object oriented (OODBMS) and object-relational database management (ORDBMS) systems has aroused. OODBMS allow object graphs to be stored to the database directly and are very often integrated with the programming language itself. Thus, they provide homogeneous environment and remove the necessity of various transformations when data are passed back and forth between application and data layer. Even though object oriented databases have undeniable benefits and advantages, their popularity is not very high.
1. Probably the best known hierarchical database is the Windows System Registry [3] 2. By “flexibility” I mean the ability of the relational model to hide its internal data representation. Clients thus do not need any knowledge, how data are physically stored and, therefore, are not affected when the server implementation changes.

7

1. I NTRODUCTION Not only because of those enormous amounts data that are already stored in relational databases (and migration of which would not be cost free), but also because of some technical issues they are still facing and which are still not yet resolved3 . Moreover, vendors of relational databases are integrating various object-oriented features into their products and thus are making the need for pure object-oriented databases less urgent. 1.1.3 NoSQL databases Recently, with the rise of interest in cloud computing, a new category or databases has occurred, so called NoSQL4 databases. NoSQL is neither a specific database model, nor an evolution of relational or object oriented databases, but it is rather a group of database products which are suited to specific scenarios, often where other solutions fail. They often offer only a feature subset of relational databases, but they are superior in certain characteristics. For example MongoDB is a document-oriented database, which shines at speed and scalability, but from design decision it lacks decent transaction management5 and, therefore, its use case is in large clusters where transactional behaviour is not crucial.

1.2

Object-relational mismatch

Currently, most of the data are stored in relational databases. In programming languages, however, object-oriented approach predominates. It best reflects the reality, models interactions of entities and their behaviour. In object-oriented programming, there are fundamental concepts like association, inheritance or polymorphism, which do not have corresponding counterparts in the world of relational databases. The object-relational mismatch occurs when data, representing some business information we need to process, are stored in a relational database. In the application processing the data, however,
3. 4. 5. More discussed at http://www.leavcom.com/db_08_00.htm Abbreviation of “Not Only SQL” MongoDB supports atomic operations on a single document

8

1. I NTRODUCTION object-oriented approach is used and everything is modelled using objects and other OOP concepts. Therefore, transformations are needed each time data are passed between the application layer and the data layer. These transformations might not be complicated if the objects are simple data holders containing only basic data types, but once we want to have a use of advanced OOP features, things would get much more complicated.

1.3

Brief history of Java persistence solutions

1.3.1 JDBC The object-relational mismatch can be tackled “by hand” using plain JDBC, which is Java’s API for database access. The API is databaseindependent6 and database vendors provide JDBC drivers for their systems. Here is a very simple code which saves a person into a database using JDBC:
public void savePerson(Person p) throws SQLException { String query = "INSERT INTO PERSON VALUES (DEFAULT, ?, ?)"; PreparedStatement stmt = connection.prepareStatement(query); stmt.setString(1, p.getName()); stmt.setString(2, p.getSurname()); stmt.executeUpdate(); }

Listing 1.1: Sample of JDBC code The advantage of this approach is that it gives the programmer full control over SQL queries sent to the database. On the other hand, one can observe that such code is rather low-level, and pollutes our service layer with SQL statements and checked exceptions. It also leads to procedural code because it forces a programmer to “un6. Even though the JDBC API is database-independent, the SQL statements are not, so it is important to be careful when utilizing non-standard SQL queries.

9

1. I NTRODUCTION pack” primitive properties7 from domain objects and put them into SQL statements manually. This is especially tedious and error-prone when we work with larger object graphs and cascaded saving. So, for reasons above we usually search for a tool or a framework, which does the tedious work for us, analyses our classes, generates SQL statements, automatically bounds parameters, etc. In coming paragraphs, I’ll briefly discuss different approaches to the Java objectrelational mapping. It is important to remember, however, that Java database programmers can never avoid JDBC entirely. Since all persistence solutions are built on top of JDBC, its understanding is essential to fine-tune the persistence framework in certain scenarios or to check logs in case that something goes wrong. As Gavin King and Christian Bauer in their book state, high-level persistence solutions are not here for programmers who do not want to learn or do not understand JDBC, SQL or the relational model itself. They are here for those, who have already done it the hard way ([6]). 1.3.2 EJB 2.x entity beans Since the very beginning of the J2EE specification there has been a technology aiming Java persistence, called EJB entity beans. Entity beans are container managed components providing various services, such as persistence or transaction management. The specification, however, was largely over-engineered from the beginning[7]. It builds on the fundamental concept that persistence should be nonintrusive to the application and rather be a service provided by the container. It leads to overwhelming complexity of both the specification and applications using it. EJB entity beans were widely used technology, but due to its complexity and general unhappiness with the specification, companies were often forced to create various proprietary persistence solutions. Several open-source frameworks have also been created, with Hibernate being the most widely used one. EJB entity beans are for backward compatibility still present in the Java EE specification, so every Java EE compliant application server
7. By “primitive properties” I mean properties of primitive data types, which are directly supported by database systems.

10

1. I NTRODUCTION has to support it. They are, however, considered as deprecated in favour of the new JPA specification. 1.3.3 JDO With a rising frustration from EJB entity beans, there was an attempt to come with an alternative; with a new persistence specification, which would work with POJOs8 and finally abandon the concept of container-managed persistence. This specification is called Java Data Objects. Even though JDO is quite powerful, in some aspects even more powerful than its successor, JPA9 ; it failed to get larger popularity among developers and become the mainstream. JDO requires byte-code manipulation to enhance persistent classes and, therefore, is quite complicated as well. Mike Keith ([7]) also claims, that one of the reasons why JDO has failed is its inherently object-oriented query language, which does not play well with programmers used to relational databases. 1.3.4 myBatis myBatis (formerly iBatis) is a lightweight persistence framework that gives the programmer full control over the SQL queries sent to the database. It does not generate any SQL code; it merely maps custom SQL statements to the properties of entities being stored in the database. Despite all the advantages and interesting ideas, it is not a fullblown persistence solution since it lacks features demanded from a general-purpose persistence framework, such as portability across different database systems (all the SQL code is database-specific). Moreover, myBatis is not part of the Java EE specification, which also means that it does not integrate with the rest of the Java EE ecosystem and, therefore, features like container-managed transaction handling, entity lifecycle callbacks or JSR-303 Bean Validation are not supported.
8. Abbreviation of Plain Old Java Object, denoting ordinary Java classes which do not follow any special conventions or framework rules (http://www. martinfowler.com/bliki/POJO.html). 9. JDO for example supports non-relational data stores, whereas JPA does not.

11

1. I NTRODUCTION

1.4

JPA

EJB 3 specification, as part of completely reworked Java EE 5, released in 2006, contained a new specification regarding persistence – Java Persistence API10 . JPA was a response to users’ increasing frustration with the complexity of EJB 2.x entity beans. Authors of proprietary persistence frameworks and other experts were invited to sit in groups working on a brand new Java persistence specification, which would replace EJB entity beans. JPA2 (included in Java EE 6, released in 2009) is an evolution of JPA. It is based on the experience with JPA and reflects users’ critique (mostly about missing features which are already present in other proprietary persistence frameworks). In this text I will only focus on JPA2 specification and its features11 .

1.5

Goals of the thesis

In this thesis, I will: • • Compare three different JPA implementations and build a feature matrix showing their strengths and weaknesses. Take an open source project written in each JPA implementation, migrate it to the other two, test it on Oracle, PostgreSQL and MySQL and document the issues I run into during the migration process. Build a migration tool, which will provide a support with migrating OpenJPA and EclipseLink projects to Hibernate.

10. In Java EE 5, the JPA specification is formally a part of the EJB 3 specification. The decision to bind them together was probably quite unfortunate, though, because JPA is not in any means dependent on EJB container and thus works perfectly fine in Java SE environments. JPA2 is separate specification already, formally independent of EJB. 11. From this point on, wherever I use “JPA”, I mean JPA2 specification. I will use the term “JPA2” only to emphasise that a particular feature was introduced in the new JPA2 standard.

12

2 Comparison of JPA providers
JPA2 is a persistence standard for the Java platform defined by Sun Microsystems in [8]. The JPA specification itself does not contain any usable code; it only describes persistence concepts and provides standard interfaces, which all standard-compliant frameworks are obliged to implement. Reference implementation of this specification is EclipseLink1 . Currently there are three main implementations of JPA: Hibernate, EclipseLink and OpenJPA. I was unable to find any reliable statistics about their popularity and market share, but considering that all of them are bundled with popular application servers2 and thus are in production use, it makes sense to compare them and try to find out which one provides the most interesting features.

2.1

Methodology of the comparison

Comparing JPA providers is a tricky task. Since every single JPA implementation has to obey the standard and implement everything that the standard defines, it might lead to an incorrect conclusion they are all the same. This is up to some point true; within the boundaries defined by the standard they are all equal. However, each JPA implementation provides features that go beyond the scope of the JPA standard. These vendor-specific features extend the framework’s functionality in various areas. I have divided the features that go beyond the scope of the JPA standard into several categories. I will go through all the categories and describe what features does the particular implementation offer and which possible alternatives do the others have. In the end, I will create a summary in form of a short feature matrix summing up the results of the comparison. I will mainly focus on features, which are directly related to the JPA and extend it in some way. Therefore, I will not discuss for exam1. http://www.eclipse.org/eclipselink/downloads/ri.php 2. EclipseLink is used in as default JPA provider in GlassFish, OpenJPA in Geronimo and Hibernate in JBoss AS.

13

2. C OMPARISON OF JPA PROVIDERS ple a support of EclipseLink for non-relational data stores,‘ because it is not a JPA extension but merely an additional capability of the framework. In order to guarantee a fair comparison, I will work with latest versions of all frameworks; at the time of writing, the latest stable versions available are Hibernate 4.0.0, EclipseLink 2.3.2 and OpenJPA 2.1.1.

2.2

Identifier generation

The JPA standard defines 4 primary key generation strategies3 (table, sequence, identity and auto). All frameworks from the comparison, however, provide additional ways to generate identifiers. In Hibernate, there is an annotation @GenericGenerator creating a non-standard ID generator. Its parameter is either a fully-qualified name of a class implementing IdentifierGenerator or a shortcut of one of the many predefined generators from the Hibernate distribution, such as increment, identity, sequence, hilo, uuid, guid and others. Since Hibernate version 3.2.3, the preferred way of generated identifiers is using TableGenerator and SequenceStyleGenerator, due to better optimization and database portability[9]4 . OpenJPA also allows creating user-defined identity generators by implementing Seq interface[10]5 . Besides the standard ones, the OpenJPA distribution contains few additional generators, most insterestingly TimeSeededSeq generating identifiers based on system time and UUIDHexSeq generating random hex strings. In EclipseLink, it is possible to create custom generators by extending Sequence and registering the generator class in persistence unit configuration (persistence.xml ) as follows:
<properties> ... <property name="eclipselink.session.customizer"> com.example.eclipiselink.CustomIdGenerator </property>
3. 4. 5. JavaTM Persistence API, Version 2.0, section 11.1.17 Hibernate developer guide, section 28.4 OpenJPA user’s guide, section 9.6

14

2. C OMPARISON OF JPA PROVIDERS
... <properties>

The distribution contains QuerySequence – an implementation of Sequence, which is not only the parent of all the standard sequence generators contained in the EclipseLink distribution, but also serves as an generic mechanism for obtaining identifiers using user-defined queries [11].

2.3

Performance

Good performance is indeed a fundamental requirement from any persistence solution. In this chapter I will benchmark all the frameworks being compared and measure how fast do they perform in various usage scenarios. For the purpose of this benchmark, I created a sample JPA application that contains several entities, which are mapped to the database schema from diagram 2.1.

Figure 2.1: ER diagram of sample database schema The persistence.xml configuration file contains several persistence units; each one configured with different data source and different JPA provider. The application also contains a simple testing framework, which is in charge of creating an EntityManagerFactory for a 15

2. C OMPARISON OF JPA PROVIDERS particular persistence unit, initializing6 and running tests, measuring execution times and logging the results. In the benchmark, I will test how the frameworks perform in the default configuration, without any vendor-specific optimizations. Each test will be executed three times and the average time will be presented as the test result. All test will run on the following configuration: • • • • • • Intel Core2 Quad Q9400 2.66GHz 4GB RAM Xubuntu Linux 11.10, 64-bit Oracle JDK 7 PostgreSQL 9.1.3 JDBC4 driver, version 9.1-901

To guarantee that tests of all frameworks run under the same conditions7 , I disabled automatic schema generation and created it manually, as shown in listing 2.1.
create sequence sample_sequence start 1000 increment 50; create table "user" ( user_id bigint primary key, username varchar(32) not null unique, enabled boolean not null default true); create table users_authorities ( user_id bigint references "user", name varchar(32) not null);
6. Some tests need data already present in the database. Required INSERT statements are, therefore, executed in the initialization phase, the duration of which is not included in the test result. 7. As seen in appendix A.2, OpenJPA generates some additional indexes, which would need to be updated with each insert and, therefore, tests execution times would be affected.

16

2. C OMPARISON OF JPA PROVIDERS

create table discussion ( discussion_id bigint primary key); create table article ( article_id bigint primary key, created timestamp default now(), headline text, content text, discussion_id bigint not null unique references discussion); create table users_articles ( user_id bigint references "user" (user_id), article_id bigint references article); create table tag ( tag_id bigint primary key, name varchar(32) not null unique); create table articles_tags ( article_id bigint not null references article, tag_id bigint not null references tag); create table post ( post_id bigint primary key, created timestamp not null default now(), author varchar(32), title varchar(32), text text); create table discussions_posts ( discussion_id bigint not null references discussion, post_id bigint not null unique references post);

Listing 2.1: DDL defining sample database schema In following sections, I will present results of the performance benchmarks. 17

2. C OMPARISON OF JPA PROVIDERS 2.3.1 Batch inserts First executed test is benchmarking batch inserts. I persist 10,000 instances of User entity8 . The results are presented in table 2.1. 1 2 3 Average Hibernate 467 8ms 4744 ms 4768 ms 4730 ms OpenJPA 4348 2ms 44779 ms 45813 ms 44691.33 ms EclipseLink 3799 ms 3832 ms 3825 ms 3818.67 ms Table 2.1: Batch inserts on PostgreSQL test results From the values in the table 2.1, we can see that EclipseLink was the fastest, with Hibernate being the second. OpenJPA, on the other hand, scored more than 10 times worse. Since such result is very surprising, I decided to run the exact same test on the same machine, but now with MySQL database9 . The result are presented in table 2.2. 1 2 3 Average Hibernate 5128 ms 5042 ms 5137 ms 5132.33 ms OpenJPA 10790 ms 10696 ms 10617 ms 10701 ms EclipseLink 3377 ms 3257 ms 3387 ms 3340.33 ms Table 2.2: Batch inserts on MySQL test results We can see that the difference is not so big anymore, but OpenJPA still performs much worse than the others. There are some performance optimizations available to tune OpenJPA for batch operations, but in the default configuration its performance was by far the worst of all compared frameworks. 2.3.2 Searching by ID In this test I make use of the values inserted by previous test. From 10,000 user inserted I fetch by ID random 1,000 of them. Results are presented in table 2.3.
8. 9. Users are persisted with an auto-generated name and no articles attached. The version of MySQL server used in the test was 5.1.61.

18

2. C OMPARISON OF JPA PROVIDERS 1 Hibernate 828 ms OpenJPA 5285 ms EclipseLink 691 ms 2 3 863 ms 756 ms 5155 ms 5227 ms 683 ms 666 ms Average 815.66 ms 5222.33 ms 680 ms

Table 2.3: Find by ID test results 2.3.3 Basic JPA QL test In this test I fetch all users stored in the database using JPA query. Table 2.4 presents the results. 1 Hibernate 1002 ms OpenJPA 1196 ms EclipseLink 1110 ms 2 3 1138 ms 1013 ms 1086 ms 1078 ms 1133 ms 1256 ms Average 1051 ms 1120 ms 1166.33 ms

Table 2.4: Fetch all users using JPA QL test results

2.3.4 Basic criteria API test This test is in its nature similar to the previous one, but I fetch all users using Criteria API instead of JPA QL. Criteria API is a new feature introduced in JPA2. Results are presented in table 2.5. 1 Hibernate 1567 ms OpenJPA 1054 ms EclipseLink 882 ms 2 3 1530 ms 1523 ms 1094 ms 1136 ms 764 ms 862 ms Average 1540 ms 1094.66 ms 836 ms

Table 2.5: Fetch all users using criteria API test results Interesting observation from these results is, that Hibernate performs significantly slower when using criteria API instead of JPA QL, whereas EclipseLink, on the other hand, performs slightly faster. Results of OpenJPA are about the same. 19

2. C OMPARISON OF JPA PROVIDERS 2.3.5 Aggregate function In this test I measure how fast counting entities using both JPA QL and criteria API is. In the test initialization phase, 1,000 users are inserted into the database, but only half of them have their account set as enabled. Each user has 15 articles and each article has 3 comments. In the test itself, I perform complex join over User, Article, Discussion and Post tables, then selection by User’s enabled property and count returned entities. Results are presented in tables 2.6 and 2.7. From the results presented in the tables below, it is interesting to see that the same query is significantly faster using the criteria API than using standard JPA query. 1 2 3 Average Hibernate 306 ms 250 ms 245 ms 267 ms OpenJPA 354 ms 400 ms 407 ms 387 ms EclipseLink 306 ms 295 ms 349 ms 316.66 ms Table 2.6: Complex join using JPA QL test results

1 2 3 Average Hibernate 127 ms 115 ms 100 ms 114 ms OpenJPA 176 ms 178 ms 177 ms 177 ms EclipseLink 104 ms 107 ms 104 ms 105 ms Table 2.7: Complex join using criteria API test results

2.3.6 Performance summary From the test results presented, EclipseLink is more or less on par with Hibernate. Out of seven executed tests, EclipseLink took the first place five times and Hibernate two times. OpenJPA, on the other hand, performs the worst in all tests except two. Especially in batch insert test, its performance is far behind the competitors. In other tests the results were quite similar. Therefore, the poor performance of applications using JPA it is more likely caused by a poor database 20

2. C OMPARISON OF JPA PROVIDERS design or incorrect use of the persistence framework (e.g. construction of queries which demand complex join, etc.); not by the framework itself. It is important to note that all the frameworks are highly configurable and offer various performance enhancements for specific usage-scenarios. Therefore, it is important not to overestimate the results of this benchmark. It does provide, however, some view on the performance of compared JPA implementations in default configurations.

2.4

Type conversion

JPA specification does not define any kind of type conversion. For example, if in the database there is a string field which stores boolean values as “Y” and “N” strings, there is no way to map it (directly) to a Java boolean. All JPA implementations, however, provide extensions which allow map various database types to Java types and also allow creating user-defined types. In Hibernate, there is a Type interface. All the types Hibernate recognizes implement this interface. So, in Hibernate there are classes like CalendarType mapping Calendar to a datetime, ClassType, which maps Java Class objects to varchars, etc. However, for creating new custom types, it is generally not recommended to implement Type directly because it would make custom type converters tightly coupled with the Type interface and future changes (such as added or removed methods) would break all custom type converters [12]. For this reason, there is an interface UserType that should be used for creating custom type converters, which are later adapted to Type using CustomType. OpenJPA also provides support for creating custom mappings. There is an interface ClassStrategy which can be used for creating mapping between custom classes and database schema. Such class strategy can be then configured using @Strategy mapping annotation. For creating various custom field mappings, OpenJPA provides ValueHandler and FieldStrategy interfaces. The latter is a bit more complicated to implement, but provides more flexibility 21

2. C OMPARISON OF JPA PROVIDERS when interacting with the database10 . Since creating custom field strategies might be in many cases overly complicated, especially when quite simple value transformation are needed, OpenJPA provides a mechanism called externalization. Using the @Externalizer annotation, we might specify either an instance method of the mapped class or a static method of any class which should be invoked to transform a value to its database representation. Its counterpart is a @Factory annotation, which specifies how the transformation from the database representation to the custom type looks like. We might pass either nothing, or an instance method name of the custom type of any static method which does the conversion. In case that nothing is passed, the constructor of the custom type is invoked. In EclipseLink, the primary interface for defining custom converters is Converter. Applications can either create custom implementations of this interface or use some of the predefined converters from the EclipseLink distribution. Some examples of the predefined converters are [13]: ObjectTypeConverter is the simplest converter available. It is used for custom mapping of database values to Java values when the formats differ TypeConversionConverter can be used for explicit mapping of data source types to Java types SerializedObjectConverter maps various binary formats into database BLOBs. When a proper converter is configured, it can be attached to the mapped attribute using the @Convert annotation.

2.5

Caching support

The JPA2 specification comes with basic support for second-level cache11 . Second-level cache is the cache at the EntityManagerFactorylevel, so it contains entities from multiple persistence contexts.
10. OpenJPA user’s guide, section 7.10.3.2 11. JavaTM Persistence API, Version 2.0, section 3.7

22

2. C OMPARISON OF JPA PROVIDERS Caching can be using shared-cache-mode configuration property in persistence.xml. Possible values are: • • • • • ALL NONE ENABLE SELECTIVE DISABLE SELECTIVE UNSPECIFIED

There is also the @Cacheable annotation used for specifying entitylevel caching mode in case that either ENABLE SELECTIVE or DISABLE SELECTIVE global caching has been set. In case of UNSPECIFIED cache mode setting, provider-specific rules apply. Since caching is handled in such minimalistic and rather abstract way, various vendorspecific extensions exist. The approach recommended in Hibernate the documentation is delegating the caching functionality to specialized caching tools12 . Thus, Hibernate neatly integrates with the most popular caching frameworks like EhCache of Hazelcast, simply by setting the configuration property hibernate.cache.region.factory class to an appropriate cache region factory of the selected caching framework. OpenJPA comes with its own data cache implementation13 , it can be turned on using openjpa.DataCache configuration property. OpenJPA also contains transaction-events notification framework14 , which can be used for cache synchronization between nodes in distributed environment. EclipseLink also contains integrated second level cache implementation and does not rely on any third-party framework. Like in OpenJPA, it also supports caching in clustered environment; using configuration property eclipselink.cache.coordination.protocol it is possible to specify which protocol should be used for cache coordination between nodes. Possible options described in the documentation [13] are RMI and JMS.
12. Hibernate developer guide, section 21.2 13. There is a plug-in integrating EhCache with OpenJPA, but since it is not even mentioned in the official documentation, I will not further discuss it. 14. OpenJPA user’s guide, section 12.2

23

2. C OMPARISON OF JPA PROVIDERS To sum up caching solutions used in JPA implementations; Hibernate tries to delegate caching to specialized tools, whereas OpenJPA and EclipseLink integrate second level caching into the core of the framework. The undeniable advantage of Hibernate’s approach is that it that it does not try to reinvent the wheel. Since specialized caching frameworks are already mature and sophisticated, it is generally a good idea to utilize them as general second-level JPA cache solutions. Slight disadvantage, however, is that third-party framework is required, which might increase the overall complexity of the architecture.

2.6

Entity lifecycle and transactional events

JPA specification defines seven entity lifecycle events: pre/post-persist, pre/post-remove, pre/post-update and post-load; and 2 ways of listening to such events - either by adding appropriate lifecycle annotation on entity method or specifying an EntityManagerFactoryscoped callback listener15 . Even though these callbacks should be fine in most cases, all vendors provide alternative ways of listening to events in the persistent layer. Hibernate has the concept of interceptors16 . Interceptors can be either Session-scoped or SessionFactory-scoped and provide some additional callbacks, but with the limitation that there can be only one per Session/SessionFactory. Hibernate also contains event architecture, which is superior to interceptor capabilities and can be used to listen to even more fine-grained events raised by the Hibernate session. All supported events are contained in EventType enum. OpenJPA support listening to transaction-related events via instances of TransactionListener interface. Such instances are registered at OpenJPAEntityManagerSPI, an OpenJPA-specific extension of EntityManager. Besides transaction-related events, it is also possible to register LifecycleListener which supports additional callbacks to those from JPA2 standard, such as DetachListener for notification when an entity becomes detached [14]. However, the documentation does not mention this interface at all, so it might not be
15. JavaTM Persistence API, Version 2.0, section 3.5 16. Hibernate developer guide, section 14

24

2. C OMPARISON OF JPA PROVIDERS intended for public use or might be changed in future releases. EclipseLink, on the other hand, contains support for events which occur in the session (EclipseLink-specific implementation of EntityManager). There is an interface SessionEventListener, which can be registered with a session for getting notification about following session-related events: • • • • • pre/post-transaction commit pre/post-transaction rollback pre/post-query execution the descriptor for an entity being persisted is missing ...

2.7

Schema generation

In order to keep the specification clean and simple17 , JPA does not force vendors to generate database schema18 . However, the authors of the specification took schema generation into an account and integrated various metadata mappings, which can be used during schema generation, such as nullable or unique annotation properties. All the compared implementations provide a schema generation functionality. Generated statements can either be sent directly to the database or saved into a file for manual execution. Also DROP statements can be generated, so during development it is possible to have all tables removed and generated again on each startup. This ensures that the application always starts in the same state – with an empty database. However, even though automatic schema generation might be convenient in the development phase, it is often a
17. Another reason why the specification does not contain any details of the schema generation is that it is very vendor-specific issue. The specification would have to describe how tables generated from entities should look like on particular database platform, which datatypes or constraints should be used, etc. Since all these elements differ from database to database very much, the specification avoids it entirely. 18. JavaTM Persistence API, Version 2.0, page 355

25

2. C OMPARISON OF JPA PROVIDERS source of problems when dealing with incremental schema upgrades. Therefore, it is generally recommended to avoid it in production environments when possible and rather use database migration tools like Liquibase 19 , Flyway 20 or MyBatis migrations 21 . In Hibernate, it is possible to turn on schema generation using configuration property hibernate.hbm2ddl.auto, which can be set to one of the following values: validate, update, create and create-drop. Then, before the EntityManagerFactory is created, Hibernate generates DDL statements and executes them on the database. If this is not the desired behaviour, there is a SchemaTool class in the Hibernate distribution, which is responsible for generating the schema and can be executed manually (either programatically or via command-line) to have DDL statements generated into a file. OpenJPA can also be configured to automatically generate the database schema on the application startup by setting configuration property openjpa.jdbc.SynchronizeMappings to buildSchema. Like in Hibernate, the schema generation can be triggered manually using MappingTool utility. However, OpenJPA goes even further in this area and, as the only one from the compared implementations, also allows generating an object model from an existing database schema22 . This feature is usually only provided by advanced UML modelling tools, such as Enterprise Architect from Sparx systems 23 . In EclipseLink, the schema generation can be turned on using persistence.xml property eclipselink.ddl-generation. There is also an option to have DDL statements not only executed directly on the database, but also export them into a file. This is done by setting the configuration property eclipselink.create-ddl-jdbc-file-name to a target file path and eclipselink.ddl-generation.output-mode to both. In appendix A, I provide DDL statements of the schema used in the sample benchmark application from chapter 2.3, automatically generated by Hibernate, OpenJPA and EclipseLink, respectively.

19. 20. 21. 22. 23.

http://www.liquibase.org/ http://code.google.com/p/flyway/ http://code.google.com/p/mybatis/ OpenJPA user’s guide, section 7.2 http://www.sparxsystems.com.au/

26

2. C OMPARISON OF JPA PROVIDERS

2.8

Support for stored procedures

A stored procedure is a group of SQL statements that is used to encapsulate a set of operations or queries to execute on a database server[15]. JPA standard doesn’t mention stored procedures support at all. However, stored procedures are executed using ordinary SQL, so as long as the stored procedure returns nothing or a result set (which is properly mapped to entities using @SqlResultSetMapping on the JPA side24 ), it can be called using JPA native queries with all JPA providers. EclipseLink, however, contains some additional, beyondthe-standard features for accessing stored procedures [13]. Let’s consider following very simple MySQL stored procedure:
CREATE PROCEDURE ‘calculate_item_count‘(OUT result INT) BEGIN select count(item_id) into result from Item; END

Listing 2.2: Sample stored procedure The stored procedure does nothing else than calculating the number of items in table Item and storing it in an output parameter result. In EclipseLink, it is possible to call stored procedure using StoredProcedureCall as follows:
EntityManager em = emf.createEntityManager(); StoredProcedureCall spc = new StoredProcedureCall(); spc.setProcedureName("calculate_item_count"); spc.addNamedCursorOutputArgument("result"); em.getTransaction().begin(); Session s = em.unwrap(Session.class); s.executeSelectingCall(spc); em.getTransaction().commit();

Using this approach, it is not necessary to deal with SQL directly in stored procedure calls; the procedure is executed by its name only. Except the StoredProcedureCall, EclipseLink also provides annotationbased approach, similar to the named queries. Using the annota24. JavaTM Persistence API, Version 2.0, section 3.8.15

27

2. C OMPARISON OF JPA PROVIDERS tion @NamedStoredProcedureQuery, it is possible to define named stored procedure call, which can be later executed in similar manner like standard @NamedQuery or @NamedNativeQuery:
Query query = entityManager.createQuery("queryName"); query.getResultList();

So, having defined the stored procedure from listing 2.2, the named stored procedure call definition would look as follows:
@NamedStoredProcedureQuery( name = "getItemCount", procedureName = "calculate_item_count", parameters = @StoredProcedureParameter(queryParameter = "result", name = "result", direction = Direction.OUT))

EclipseLink has the best support of stored procedures from all compared JPA implementations. Using Hibernate, the named native query definition for stored procedure call would look like this:
@NamedNativeQuery(name = "getItemCountHibernateWay", query = "? = call calculate_item_count()")

However, this code for procedure calculate item count, as defined in listing 2.2, would fail because Hibernate does not yet support native scalar queries; the code would lead to NotYetImplementedException. OpenJPA also claims support of stored procedures25 , but I was unable to get running any.

2.9

Integrating with other frameworks

Another important decision factor for choosing a JPA implementation is its support for other frameworks which would extend its capabilities even further.
25. It is very briefly mentioned in the documentation, without any examples or further details.

28

2. C OMPARISON OF JPA PROVIDERS Hibernate is known for an excellent integration with various thirdparty frameworks. Hibernate is not just a persistence framework, it is sort of an ecosystem with many frameworks built around Hibernate ORM. In chapter 2.5, I was already discussing its integration with caching frameworks like EhCache or Hazelcast. Two another important and useful extensions are Hibernate Search and Hibernate Validator. Hibernate Validator is a reference implementation of the specification JSR 303: Bean Validation [16]. JSR 303 a standard API for declarative validations, which JPA already supports for validating entities upon lifecycle events26 . The configuration is done via property javax.persistence.validation.mode in persistence.xml. The possible values are AUTO, CALLBACK or NONE. Hibernate provides an additional option – DDL. When the value of this property is set either to AUTO or DLL and Bean Validation is present in the classpath, the validation metadata are also used in schema generation. When an entity attribute is decorated with one of the supported annotations from this specification, Hibernate reflects such a validation constraint in the generated schema. For example, if an attribute of type long is annotated with JSR-303 annotation @Max(30), Hibernate schema generator adds a database check constraint to ensure that values in the column are less or equal to 30. There are more supported JSR-303 annotations, such as @Min, @NotNull, @Size, etc. [17]27 Hibernate Search is a framework built on top of Apache Lucene aiming to provide full-text search capabilities to the domain model. It works with both native Hibernate API and Hibernate EntityManager.

2.10 Licenses
When a company is considering implementing their enterprise solution on top of some ORM framework, the character of its license is very important. Fortunately, all of the compared frameworks are released under permissive and business-friendly licenses. Hibernate
26. JavaTM Persistence API, Version 2.0, section 3.6 27. Hibernate Validator reference documentation, section 2.4.1

29

2. C OMPARISON OF JPA PROVIDERS uses LGPL v2.128 OpenJPA uses Apache License v2.029 and EclipseLink is dual licensed under Eclipse Public License - v1.0 and Eclipse Distribution License v1.030 . Dual license means that users can choose which license suits their business needs best.

2.11 Documentation quality
During my work on this thesis I was extensively reading the documentation on each particular topic discussed here and, therefore, I am eligible to provide some comparison of the documentation quality of all the frameworks. Hibernate has a high-quality documentation. With every new version of the framework, there is a new document describing its features, it is possible to find documentation of older releases and everything is clearly explained with examples. Since Hibernate is the most widely used implementation, there is also a lot of examples all over the internet and users can ask questions in discussion forums31 . OpenJPA also has high-quality documentation, everything is possible to find there. However, it is unfortunately not as example-rich as in Hibernate; it is purely a reference guide. As every Apache project, OpenJPA provides mailing lists for users’ questions. For EclipseLink, there is quite a lot of documentation, but its problem is the fragmentation. Indeed, it is possible to find answers to most of the questions, but it took me significant amount of time compared to Hibernate or OpenJPA. Moreover, since the documentation is not versioned, it is sometimes fairly difficult to distinguish which functionality is new to EclipseLink implementation of JPA2 and which was already present in EclipseLink JPA1. EclipseLink also has its support forums32 .

28. 29. 30. 31. 32.

http://www.hibernate.org/license http://openjpa.apache.org/license.html http://wiki.eclipse.org/EclipseLink/FAQ/General http://forums.hibernate.org http://www.eclipse.org/forums/index.php/f/111

30

2. C OMPARISON OF JPA PROVIDERS

2.12 Build systems
Nowadays, Maven can be considered as the industry de-facto standard build and project management tool. EclipseLink and OpenJPA both use it for their builds. Hibernate in version 4.0, however, has switched to Gradle, which is a build tool written in Groovy. Hibernate creators switched to Gradle hoping to simplify the build process and get rid of other various build-related problems 33 .

2.13 Summary
In this section I discussed various vendor-specific extension of the JPA standard. I summarize this chapter in following brief featurematrix: Custom identifier generation Performance in the default configuration Build-in event notification framework Support for custom types Schema generation Caching Nonstandard stored procedures support Third-party frameworks support Documentation quality Business-friendly license Hibernate Yes 2. Powerful Yes Yes Third-party No Yes High Yes OpenJPA EclipseLink Yes Yes 3. Yes Yes Powerful Custom No No High Yes 1. Yes Yes Yes Custom Yes No Moderate Yes

Table 2.8: Feature matrix It is important to note that the list of features presented in this chapter is by no means exhaustive; I just described important fea33. Deeper explanation of the reasons behind the transition to Gradle can be found at https://community.jboss.org/wiki/GradleWhy.

31

2. C OMPARISON OF JPA PROVIDERS tures which are most likely to be used in “real-world” applications using JPA. All implementations contain additional, more or less cornercase features or performance enhancements, the description of which would be beyond the scope of this thesis.

32

3 Experimental migration of JPA applications
In this chapter, I will take sample applications of all compared implementations, migrate them to the other two and test it on PostgreSQL, MySQL and Oracle. I will try to use standard features where possible and fall back to vendor-specific extensions only when necessary. Features, which are neither possible to migrate to the standard API, nor have they counterparts in the other frameworks, will be removed entirely. All migrated projects were tested on MySQL 5.5.22, PostgreSQL 9.1.3 and Oracle XE 11.2. Since all the projects use automatic schema generation, there were not real issues with the database portability; the only exception was identifier generation. Both PostgreSQL and Oracle support sequences, so I could use sequence generation strategy there. In MySQL, however, due to the missing support of sequences, I had to switch to the table generation strategy1 .

3.1

Migrating from Hibernate

Since Hibernate does not ship with any ”official” reference application, I searched at SourceForge 2 for some projects using Hibernate, which would be suitable candidates for experimental migration. In the end, I decided for open-forum 3 because it is already managed by Maven and has convenient and easily understandable project structure. Open-forum describes itself as ”an opensource forum engine written in Java”. The project is not yet finished, quite a lot of functionality is still not implemented, but as an example project for experimental migration it is fine. Open-forum uses Spring framework, JSF2 and Hibernate. Interesting fact is that it does not use JPA at all, but rather relies on the native Hibernate API. Therefore, it will be necessary to migrate complete persistence logic to the standard API. The project also depends
1. I might have chosen AUTO generation strategy, which automatically determines the right one for the particular database, but since I prefer to have control over what is going to be generated, I configured the generation strategy manually. 2. http://www.sourceforge.net/ 3. http://sourceforge.net/projects/open-forum/

33

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS on Hibernate Search, but the searching functionalities are not yet implemented. If they were, however, it wouldn’t be possible to migrate the project at all, because Hibernate Search does not work with any other JPA providers except Hibernate, as I already discussed in section 2.9. The migration of this project consists of following steps: • Entirely remove Hibernate dependency from the Maven configuration file. Modify Spring configuration files to bootstrap the JPA entity manager factory instead of the Hibernate session factory. This is done via LocalContainerEntityManagerFactoryBean bean defined in the Spring application context. Change Hibernate-specific entity annotations. Since the project was already using many standard annotations from package javax.persistence where possible, it was only necessary to migrate the @LazyCollection annotation, which I replaced with the fetch attribute of corresponding @ManyToMany annotations. Replace classes from the native Hibernate API, such as Query or Session, with its standard counterparts. Rewrite HQL queries because their syntax in some aspects differs from the JPA QL. Create persistence.xml configuration file and properly define a new persistence unit and its properties.

Migration of this project was went without any serious complications. It was even not necessary to use any vendor-specific extensions (with the exception of different configuration property for schema generation). 34

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS

3.2

Migrating from OpenJPA

Unlike Hibernate, for OpenJPA there are two sample applications available at its website4 – OpenBooks and OpenTrader. I first tried to migrate OpenTrader, which is a GWT web application. Unfortunately, I was unable to get running even its original OpenJPA-based version, due to some GWT-related errors. Therefore, I will migrate OpenBooks in this section. Openbooks uses Maven for dependency management. Therefore, the first step of the migration process was removing all OpenJPArelated dependencies and adding Hibernate EntityManager. After I imported such project into the IDE and tried to compile it, I received more than a hundred of compilation errors. OpenBooks unfortunately heavily relies on the native OpenJPA API. The next task was migrating JPA configuration in persistence.xml. I changed the provider of the persistence unit to Hibernate, removed OpenJPA-specific configuration options and replace them with their Hibernate counterparts when possible. The configuration options not applicable in Hibernate, such as automatic enhancement of persistent classes, were removed entirely. Some errors were related to missing metamodel classes required for the type-safe criteria API5 – a new feature introduced in JPA2. The original OpenJPA-based project used an ANT task to generate such classes, so in its migrated version, I also had to generate such metamodel classes. There is a subproject of Hibernate called Hibernate Metamodel Generator 6 which provides an annotation processor exactly for this purpose. Metamodel generator can be run either as an ANT task, Maven plugin or simply from the command line. Since my IDE supports triggering annotation processors on compile-time, I used the last option and thus didn’t have to execute any ANT tasks or configure additional Maven plugins. Another group of compilation errors was complaining about missing OpenJPA-specific classes on the classpath. Since the number of missing classes was very large and the application API was heavily
4. 5. 6. http://openjpa.apache.org/samples.html JavaTM Persistence API, Version 2.0, section 6.4 http://www.hibernate.org/subprojects/jpamodelgen.html

35

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS relying on them, I came to the conclusion that Openbooks application cannot be migrated to Hibernate (or any other JPA provider) without rewriting the major part of the code and reconsidering its design concepts. It is simply not possible to replace missing OpenJPA classes with their alternatives from Hibernate, which would do more or less the same thing. OpenJPA-specific classes are distributed throughout the application code, classes using them depend on another classes using them and removing a specific class or a method from the application results in a large chain of broken dependencies. The lesson learned from the migration of Openbooks to Hibernate can be useful when designing new applications based on JPA. If there is even a slightest chance that it might be necessary to migrate the project to a different JPA provider in the future, the developers should try hard to stick with the standard API and avoid vendorspecific extensions when possible. The migration process will likely be painful and costly, because the native APIs of different JPA providers do not work as simple replacements of each other. Each JPA provider comes with some concepts of its native API, which are not possible to simply emulate in a different JPA provider.

3.3

Migrating from EclipseLink

As a sample application of migration from EclipseLink, I will use jpa.employee sample, which can be found at EclipseLink SVN repository7 . It is a project for focused on management of employees and projects and is using JSF as frontend technology and EJB for the service tier. Its domain model is not very complicated but perfectly fine to show interesting JPA and EclipseLink features. In order to work with the application (and later with its migrated copies) in a convenient way, I had first to polish it a bit (add Maven for builds and dependency management, refactor project structure, etc.). However, I was not able to successfully start the application on various application servers8 until I installed JBoss AS7, where the deployment process was entirely painless.
7. http://dev.eclipse.org/svnroot/rt/org.eclipse. persistence/trunk/examples 8. I tried to run it on Glassfish and Apache TomEE

36

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS Since the project is using container-managed datasource, there was no need to configure it in the application; I just configured the datasource in the JBoss administration console9 . During the migration, I had to rewrite or change the following parts of the code: • In this sample application, there was used EclipseLink-specific query hint QueryHints.FETCH, which ”configures the query to optimize the retrieval of the related objects, the related objects will be joined into the query instead of being queried independently” [11]. Since there is no direct alternative in both Hibernate and OpenJPA, I removed this optimization hint entirely. There was a very EclipseLink-specific way of capturing logged SQL statements (using Java reflection). I removed this code entirely and introduced a cleaner way of SQL statements logging using org.hibernate.LOG logger. In OpenJPA, there is a configuration property openjpa.Log, which is used for SQL statements logging. I had to remove eclipselink-orm.xml configuration file, which is also EclipseLink-specific. It contained various named query definitions (which I declared using annotations on entity-level), some additional persistence unit configuration and, most importantly, entity mapping metadata, which I added into entities using the standard annotation-based approach. The application was saving gender field of employees in format M and F for males and females, respectively. It was using Eclipse-Link specific object-type-converter to convert such strings stored in the database into instances of Gender on the Java side. Since such conversion is not possible in pure JPA, I had to use vendor specific extensions of Hibernate and OpenJPA.

9. I find such configuration very convenient, because it is no longer necessary to bundle database driver with the application; the application just expects that the connection is there.

37

3. E XPERIMENTAL MIGRATION OF JPA APPLICATIONS – In Hibernate, I extended UserType class to create custom type converter, which is responsible for a serialization of the Java enum into its database VARCHAR representation and back. In OpenJPA, I annotated the gender field with @Externalizer("toString") and @Factorty, as described in section 2.4. Since for @Factory, I did not specify any additional parameter, the constructor will be called for obtaining new Gender instance.

For OpenJPA, I enabled unenhanced persistent classes on runtime using configuration property openjpa.RuntimeUnenhancedClasses. Even though the classes should be enhanced automatically during the deployment on the JBoss application server, I was receiving an error that they are not. Even though I might have run compile-time enhancement using Ant or Maven (like I did in the performance benchmark), I decided to simply allow classes to be unenhanced in this case.

3.4

Migration summary

In this section, I migrated various applications between different JPA implementations. Two out of three migrations were successful, one was not. Surprisingly, the unsuccessful one lead to probably the most interesting conclusion, because it shows that migrating an application between JPA providers is not easy if it heavily utilizes the native API.

38

4 Automatic migration tool
Practical part of the thesis is an application providing a support with migrating existing projects from OpenJPA and EclipseLink to Hibernate, I call it JPA Migration and it is written in Scala.

4.1

The application architecture

The architecture of the application was designed with emphasis on its extensibility. Migration of a project is divided into migrating individual source files using so called processors. They are responsible for locating various vendor specific features and displaying hints how could such features be migrated to Hibernate1 . If there is a direct counterpart in Hibernate, the feature is migrated automatically2 . For both supported JPA implementations (OpenJPA, Hibernate), there are several processors configured in the application configuration. All parsers mix in3 the trait Processor, which defines two abstract methods: •
appliesFor(filename: String) returns a boolean value in-

dicating whether the processor should be executed for a file with a particular filename. •
parse(source: String) receives a file content as a function

argument, analyses it, migrates the constructs it is responsible for, and returns a tuple, where the first element is a new migrated source4 and the second argument is a list of instances of MigrationTaskResult, which provides a description of a single step performed during the migration.
1. If it is possible to migrate a specific vendor-specific feature directly to the standard API, it will be preferred over migrating to Hibernate. 2. Currently, there are automatically migrated only standardized JPA persistence.xml properties and some query hints whose semantics is the same in multiple frameworks. 3. Mix in a Scala train is kind of an alternative concept to implementing an interface in Java. 4. When no migration is done, the returned source is the same as the input.

39

4. A UTOMATIC MIGRATION TOOL There are two additional abstract classes PropertiesXmlProcessor and ClassProcessor, which both mix in Processor and serve as base classes of all processors working with Java sources files or XML, respectively. They provide various convenience methods for all the processors of the same kind. On figure 4.1 there is the class diagram describing all the classes participating in the migration process. The migration process itself consists of following steps: 1. The application starts. It receives four command-line arguments: • • • • 2. the project directory an output directory the location of a file where migration the report should be written to the name of the current JPA provider (eclipselink or openjpa)

The arguments are validated (e.g. if the project directory exists, if the output directory is writeable, some additional access right are checked, etc.) The migration itself begins. The input directory is as a whole copied to the output destination. The list of available processors for the particular JPA implementation is loaded from the application configuration. The output directory is walked recursively and for each file, all processors whose method appliesFor() returns true, are subsequently executed5 ; each consecutive processor receives the output of the previous processor as an input argument. Implemented this way, the source code is successively migrated and the lists of MigrationTaskResult instances (the second element of a tuple returned by the parse() method) are concatenated, forming a complete migration report for a particular file, represented by an instance of class Report.
Their method parse() is executed.

3. 4. 5.

5.

40

4. A UTOMATIC MIGRATION TOOL 6. 7. Migrated source code is written back to the source file. Instances of Report created during the migration of individual source files are collected and passed as an argument to the templating engine Scalate, which is responsible for creating HTML report using the predefined template.

4.2

Java source files parsing

While the parsing of XML configuration files is rather straightforward and only involves quite simple DOM tree manipulation, parsing Java source files and their analysis is much more complicated. It is necessary to obtain an abstract representation of a Java source file – an abstract syntax tree (AST). Such AST is then recursively searched for occurrences of vendor-specific constructs, such as types, annotations or imports. For creating AST representations of Java source files, I use the library javaparser 6 . In listing 4.1, I demonstrate how Scala pattern matching [18] is used to recursively browse an AST and search for a specific annotation. The method receives three arguments – an instance of CompilationUnit which is an abstract representation of a single source file; an annotation to search for and a closure to call for all annotations found. The method recursively searches all types (classes or interfaces) and their members (fields, constructors, methods and inner classes) if they are declared with a specific annotation. If such annotation is found, the closure is called, passing the found annotation as an argument. Such solution is very flexible – the client just passes in an arbitrary closure which is guaranteed to be called for all matching annotations found. It is a responsibility of the caller7 to do something “useful” with it. Since the very same approach is used to search for occurrences of vendor-specific types in an AST, pattern matching and recursive functions are of a great use here again. I browse the AST from the

6. 7.

http://code.google.com/p/javaparser/ The callers in the current architecture are the processors.

41

4. A UTOMATIC MIGRATION TOOL top8 , pattern match on the class of the current statement and recursively “unwrap” simpler statements out of the more complex ones. For example, when parsing an if statement, I recursively parse its condition, if block and else block. When a vendor-specific type is encountered at any point, the closure passed as an argument to the parsing function is called, passing it the found type as an argument.
def processAnnotations(unit: CompilationUnit, annotation: String, closure: (AnnotationExpr => Unit)) { if (unit.getTypes != null) unit.getTypes.foreach(handleMember) def handleMember(element: BodyDeclaration) { element match { case x: ClassOrInterfaceDeclaration => { handleAllowedType() x.getMembers.foreach(handleMember) } case _: FieldDeclaration | _: ConstructorDeclaration | _: MethodDeclaration => handleAllowedType() case _ => } def handleAllowedType() { if (element.getAnnotations != null) element.getAnnotations.foreach(x => if (x.getName.getName == annotation) closure(x)) } } }

Listing 4.1: Recursively searching the abstract syntax tree for vendor8. It means that the parsing starts with the whole source file, then recurses into all the types it contains, then to all their fields, methods and constructors which contain various language constructs, such as ifs, loops, anonymous classes, returns, etc., down to atomic statements like string literals or null expressions.

42

4. A UTOMATIC MIGRATION TOOL specific annotations using Scala pattern matching

4.3

Ideas for a further development

The application is written in a clean functional style, with emphasis on its extensibility and code clarity. Migration of additional constructs is just a matter of writing a processor (which is very simple if one of the abstract classes providing support for manipulating the DOM or browsing an AST is extended) and plugging it into the application by simply configuring the class name of the processor in the config.properties configuration file. The current implementation contains support for migrating almost all the features described in chapter 2. However, as I noted earlier, both JPA implementations contain additional features, which were not discussed and the parsing of which is currently not implemented. Therefore, a future development might focus on parsing and migrating additional vendorspecific features, which are not handled in the current implementation. Additional problem is that the parser used for creating AST support only Java up to version 5. Therefore, parsing of Java source files which contain new language constructs introduced in Java 6 or Java 7 (such as the diamond operator or try block with resources), leads to parsing errors. Unfortunately, this is not easy to fix because it would require the change of the parser and very likely also most of the processing logic.

43

4. A UTOMATIC MIGRATION TOOL

Figure 4.1: Class diagram of the migration application 44

5 Conclusion
In the theoretical part of the thesis I was comparing the most popular JPA implementations according to various criteria I selected. An expected output from any comparison work is a ranking indicating who the winners and the losers are. Unfortunately, create such ranking is not possible in case of JPA implementations. All implementation have their strengths and weaknesses and it heavily depends on the needs of the implementer. Hibernate would be the choice for its large community, high-quality documentation, integration with many other frameworks and the support provided by JBoss. EclipseLink, however, is the reference implementation of the JPA standard and its biggest advantage is its great support for stored procedures. And finally OpenJPA contains the most powerful features in the area of a schema generation and generation of persistence classes out of existing database schema. However, as I discovered in chapter 3, if a project is heavily utilizing native features of any JPA implementation, it is necessary to remember that the migration process from one provider to another will be difficult and will require significant changes in the code base. Practical part involved creating an application providing support for migrating a project from one JPA provider to another. The application is developed in the Scala programming language and is called JPA migration. It goes through all the sources contained in the source directory, searches for vendor-specific features and tries to migrate them automatically either to the standard API or to Hibernate if there is not any standard alternative. If it is not possible to it automatically, it displays a hint how could the particular vendor-specific feature be migrated manually, as well as a link to the corresponding chapter in the documentation.

45

A Generated database schemas
A.1 Hibernate

create table "user" ( user_id int8 not null, enabled boolean, username varchar(255) not null unique, primary key (user_id) ) create table article ( article_id int8 not null, content varchar(255) not null, created timestamp, headline varchar(255) not null, discussion_id int8 not null, primary key (article_id) ) create table articles_tags ( article_id int8 not null, tag_id int8 not null, primary key (article_id, tag_id) ) create table discussion ( discussion_id int8 not null, primary key (discussion_id) ) create table discussions_posts ( discussion_id int8 not null, post_id int8 not null unique, unique (post_id) ) create table post (

46

A. G ENERATED DATABASE SCHEMAS
post_id int8 not null, author varchar(255) not null, created timestamp, text varchar(255) not null, title varchar(255) not null, primary key (post_id) ) create table tag ( tag_id int8 not null, name varchar(255) not null unique, primary key (tag_id) ) create table users_articles ( user_id int8 not null, article_id int8 not null, primary key (user_id, article_id) ) create table users_authorities ( user_id int8 not null, name varchar(255) not null, primary key (user_id, name) ) alter table article add constraint FKD458CCF6386589CA foreign key (discussion_id) references discussion alter table articles_tags add constraint FK487AF8DBDF6FF6EA foreign key (article_id) references article alter table articles_tags add constraint FK487AF8DBBBDB3C6A foreign key (tag_id) references tag

47

A. G ENERATED DATABASE SCHEMAS
alter table discussions_posts add constraint FKA16447DF386589CA foreign key (discussion_id) references discussion alter table discussions_posts add constraint FKA16447DF48822CA foreign key (post_id) references post alter table users_articles add constraint FK2F4ABE94DF6FF6EA foreign key (article_id) references article alter table users_articles add constraint FK2F4ABE941316CEEA foreign key (user_id) references "user" alter table users_authorities add constraint FK6555336A1316CEEA foreign key (user_id) references "user" create sequence sample_sequence

Listing A.1: Hibernate-generated sample database schema

A.2 OpenJPA
CREATE SEQUENCE sample_sequence START WITH 1 CACHE 50; CREATE TABLE "user" (user_id BIGINT NOT NULL, enabled BOOL, username VARCHAR(255) NOT NULL, PRIMARY KEY (user_id), CONSTRAINT U_USER_USERNAME UNIQUE (username));

48

A. G ENERATED DATABASE SCHEMAS
CREATE TABLE article (article_id BIGINT NOT NULL, content VARCHAR(255) NOT NULL, created TIMESTAMP, headline VARCHAR(255) NOT NULL, discussion_id BIGINT NOT NULL, PRIMARY KEY (article_id)); CREATE TABLE articles_tags (article_id BIGINT, tag_id BIGINT); CREATE TABLE discussion (discussion_id BIGINT NOT NULL, PRIMARY KEY (discussion_id)); CREATE TABLE discussions_posts (discussion_id BIGINT NOT NULL, post_id BIGINT NOT NULL, CONSTRAINT U_DSCSSTS_POST_ID UNIQUE (post_id)); CREATE TABLE post (post_id BIGINT NOT NULL, author VARCHAR(255) NOT NULL, created TIMESTAMP, text VARCHAR(255) NOT NULL, title VARCHAR(255) NOT NULL, PRIMARY KEY (post_id)); CREATE TABLE tag (tag_id BIGINT NOT NULL, name VARCHAR(255) NOT NULL, PRIMARY KEY (tag_id), CONSTRAINT U_TAG_NAME UNIQUE (name)); CREATE TABLE users_articles (user_id BIGINT, article_id BIGINT); CREATE TABLE users_authorities (user_id BIGINT, name VARCHAR(255) NOT NULL); CREATE INDEX I_ARTICLE_DISCUSSION ON article (discussion_id); CREATE INDEX I_RTCLTGS_ARTICLE_ID ON articles_tags (article_id); CREATE INDEX I_RTCLTGS_ELEMENT ON articles_tags (tag_id); CREATE INDEX I_DSCSSTS_DISCUSSION_ID ON discussions_posts (discussion_id); CREATE INDEX I_DSCSSTS_ELEMENT ON discussions_posts (post_id); CREATE INDEX I_SRS_CLS_ELEMENT ON users_articles (article_id); CREATE INDEX I_SRS_CLS_USER_ID ON users_articles (user_id); CREATE INDEX I_SRS_RTS_USER_ID ON users_authorities (user_id);

Listing A.2: OpenJPA-generated sample database schema 49

A. G ENERATED DATABASE SCHEMAS

A.3 EclipseLink

CREATE TABLE article (article_id BIGINT NOT NULL, content VARCHAR(255) NOT NULL, created TIMESTAMP, headline VARCHAR(255) NOT NULL, discussion_id BIGINT NOT NULL, PRIMARY KEY (article_id)) CREATE TABLE discussion (discussion_id BIGINT NOT NULL, PRIMARY KEY (discussion_id)) CREATE TABLE post (post_id BIGINT NOT NULL, author VARCHAR(255) NOT NULL, created TIMESTAMP, text VARCHAR(255) NOT NULL, title VARCHAR(255) NOT NULL, PRIMARY KEY (post_id)) CREATE TABLE tag (tag_id BIGINT NOT NULL, name VARCHAR(255) NOT NULL UNIQUE, PRIMARY KEY (tag_id)) CREATE TABLE "user" (user_id BIGINT NOT NULL, enabled BOOLEAN, username VARCHAR(255) NOT NULL UNIQUE, PRIMARY KEY (user_id)) CREATE TABLE users_articles (article_id BIGINT NOT NULL, user_id BIGINT NOT NULL, PRIMARY KEY (article_id, user_id)) CREATE TABLE articles_tags (article_id BIGINT NOT NULL, tag_id BIGINT NOT NULL, PRIMARY KEY (article_id, tag_id)) CREATE TABLE discussions_posts (discussion_id BIGINT NOT NULL, post_id BIGINT NOT NULL UNIQUE, PRIMARY KEY (discussion_id, post_id)) CREATE TABLE users_authorities (user_id BIGINT, name VARCHAR(255) NOT NULL) ALTER TABLE article ADD CONSTRAINT FK_article_discussion_id FOREIGN KEY (discussion_id) REFERENCES discussion (discussion_id) ALTER TABLE users_articles ADD CONSTRAINT FK_users_articles_article_id FOREIGN KEY (article_id) REFERENCES article (article_id) ALTER TABLE users_articles ADD CONSTRAINT FK_users_articles_user_id FOREIGN KEY (user_id) REFERENCES "user" (user_id)

50

A. G ENERATED DATABASE SCHEMAS
ALTER TABLE articles_tags ADD CONSTRAINT FK_articles_tags_tag_id FOREIGN KEY (tag_id) REFERENCES tag (tag_id) ALTER TABLE articles_tags ADD CONSTRAINT FK_articles_tags_article_id FOREIGN KEY (article_id) REFERENCES article (article_id) ALTER TABLE discussions_posts ADD CONSTRAINT FK_discussions_posts_post_id FOREIGN KEY (post_id) REFERENCES post (post_id) ALTER TABLE discussions_posts ADD CONSTRAINT FK_discussions_posts_discussion_id FOREIGN KEY (discussion_id) REFERENCES discussion (discussion_id) ALTER TABLE users_authorities ADD CONSTRAINT FK_users_authorities_user_id FOREIGN KEY (user_id) REFERENCES "user" (user_id) CREATE SEQUENCE sample_sequence INCREMENT BY 50 START WITH 50

Listing A.3: EclipseLink-generated sample database schema

51

Bibliography
[1] R. Ramakrishnan and J. Gehrke, Database management systems. McGraw-Hill international editions: Computer science series, McGraw-Hill, 2003. [2] A. Silberschatz, H. Korth, and S. Sudarshan, Database System Concepts. McGraw-Hill, 2010. [3] R. Stephens, Beginning Database Design Solutions. John Wiley & Sons, 2010. [4] N. H. Bercich, “The evolution of the computerized database,” ARXIV, 2003. [5] E. F. Codd, “A relational model of data for large shared data banks,” Commun. ACM, vol. 13, pp. 377–387, June 1970. [6] C. Bauer and G. King, Java persistence with Hibernate. Manning Pubs Co Series, Manning, 2007. [7] M. Keith and M. Schincariol, Pro JPA 2: mastering the Java Persistence API. Apress Series, Apress, 2009. [8] “JSR 317: Java Persistence API, Version 2.0.” http://jcp. org/en/jsr/detail?id=317, Dec. 2009. [9] “Hibernate developer guide.” http://docs.jboss.org/ hibernate/core/4.0/devguide/en-US/html/. [10] “OpenJPA user’s guide.” http://openjpa.apache.org/ builds/2.1.1/apache-openjpa/docs/manual.html. [11] “EclipseLink 2.3 API documentation.” http://www. eclipse.org/eclipselink/api/2.3/index.html. [12] “Hibernate ORM 4.0 API documentation.” http://docs. jboss.org/hibernate/orm/4.0/javadocs/. [13] “EclipseLink project wiki.” http://wiki.eclipse.org/ Category:EclipseLink/Documentation/JPA. 52

A. G ENERATED DATABASE SCHEMAS [14] “OpenJPA 2.2.0 API documentation.” http://openjpa. apache.org/builds/2.2.0/apidocs/index.html. [15] M. Fisher, J. Ellis, and J. Bruce, Jdbc Api Tutorial and Reference. Java Series, Addison-Wesley, 2003. [16] “JSR 303 : Bean Validation.” http://jcp.org/en/jsr/ detail?id=317, Nov. 2009. [17] “Hibernate Validator reference documentation.” http: //docs.jboss.org/hibernate/validator/4.2/ reference/en-US/html/. [18] M. Odersky, L. Spoon, and B. Venners, Programming in Scala. Artima Series, Artima Press, 2011.

53

Sign up to vote on this title
UsefulNot useful