You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262158506

Enhanced segment trees in object-relational mapping

Conference Paper · September 2013


DOI: 10.1145/2490257.2490291

CITATION READS
1 2,190

3 authors, including:

Piotr Wiśniewski Krzysztof J. Stencel


Nicolaus Copernicus University University of Warsaw
36 PUBLICATIONS   119 CITATIONS    100 PUBLICATIONS   510 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Software mining View project

A GPU Lightweight Compression Library View project

All content following this page was uploaded by Krzysztof J. Stencel on 15 May 2014.

The user has requested enhancement of the downloaded file.


Enhanced Segment Trees In Object-Relational Mapping

Michał Gawarkiewicz Piotr Wiśniewski Krzysztof Stencel


Faculty of Mathematics and Faculty of Mathematics and Institute of Informatics
Computer Science Computer Science University of Warsaw
Nicolaus Copernicus Nicolaus Copernicus Warsaw, Poland
University University stencel@mimuw.edu.pl
Toruń, Poland Toruń, Poland
garfi@mat.umk.pl pikonrad@mat.umk.pl

ABSTRACT 1. INTRODUCTION
Tree-shaped data often occur in business applications, e.g. a The architecture of an applications usually is a collection
corporate hierarchy or a categorization of products. A nat- of trade-offs. On one hand, clear architectures facilitate de-
ural class of analytic queries posed to such data consists of velopment and maintenance. They also reduce the cost of
aggregate queries over subtrees. Evaluation of such queries these activities. On the other hand, they may hinder the
in large data sets requires significant amount of time. In performance. One of possible solutions to this problems is
this paper we focus on dedicated data structures that mate- the introduction of additional layers. Materialized views and
rialize partial results of such queries in a form of well-known object-relational mapping systems are examples of such lay-
segment trees. In a multiprogramming environment such ers. In this paper we analyze a class of solutions based on
data structures require careful implementation. A naı̈ve de- these observations. Object relational mappers (ORM) are
sign is going to suffer from synchronization problems. The still underutilized. Moreover, they are often recognized as
root of such a structure will be updated by each transaction a performance hazard. In spite of the common perception
that changes anything down its subtree. We propose ring that ORMs just provide the galvanic mapping between ob-
updates that allow using the presented data structure with jects and rows, they also constitute a layer of middleware.
multiple execution threads. Our implementation is designed This layer can be used to conceal a plethora of performance
to work with object-relational mapping systems. If an ap- solutions that (1) significantly reduce the response time of
plication uses stored hierarchical data, its designer can add an application and (2) are not visible for application pro-
annotations to augment mapped database objects with ma- grammers.
terialization of partial aggregations over subtrees. Mapping In prequel papers [1, 2, 3] we integrated recursive queries
generators create all necessary storage objects and triggers. into ORMs. We showed that the speed of query process-
We describe our proof-of-concept prototype implementation ing against recursive structures significantly increased. We
of this feature in Hibernate. We also present an experimen- also enhanced ORMs with solutions to materialize partial
tal evaluation of this prototype’s performance. The results aggregation [4]. They allow fast aggregate query processing
confirm that the proposed materializations notably boost without obscuring the architecture of an application.
the evaluation of analytical queries over hierarchies. In this paper we propose solutions to cater for notewor-
thy trickier application needs. Assume a dimension table
that is organized as a hierarchy, e.g. the employee table
Categories and Subject Descriptors with subordinate-manager many-to-one relationship. The
H.2.6 [Database Management]: Middleware for databases— fact table contains sales data. Each sale is connected to one
Object-relational mapping facilities employee. This database is frequently asked queries for total
sales of a given employee and all his/her subordinates. Such
queries are needed e.g. in companies that perform multi-level
General Terms marketing.
Performance We propose to accelerate such queries using materialized
data structures similar to segment trees. We present en-
hanced segment trees that are (1) well-suited for any trees
Keywords (possibly non-binary) and (2) efficient in a multithreaded
materialized views, analytical queries, hierarchical data, object- execution environment. The root of a segment tree has to
relational mapping be updated by each transaction that modifies anything be-
low. The lock contention and possible deadlocks caused by
the naı̈ve solution is not acceptable by any application. The
method proposed in this paper solves such synchronization
Permission to make digital or hard copies of all or part of this work for problems.
personal or classroom use is granted without fee provided that copies are We use the object-relational mapping layer to conceal all
not made or distributed for profit or commercial advantage and that copies
peculiarities of the solution. Generators built into ORM care
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific for creating appropriate materializations in the database and
permission and/or a fee. for synchronization of concurrent updates.
BCI’13 September 19-21, 2013, Thessaloniki, Greece.
Copyright 2013 ACM 978-1-4503-1851-8/13/09 ...$15.00.

122
Contributions. Listing 1: XML used to define a recursive query
This paper makes the following contributions:
<r c t e>
• we propose enhanced segment trees as a materializa- <r c t e T a b l e name=” s u b o r d i n a t e s ”/>
tion of partial aggregations in hierarchies in order to <t a b l e s><t a b l e>Emp</ t a b l e></ t a b l e s>
improve performance of analytical queries over these <r e c u r s i v e −c o n d i t i o n>
hierarchies; <on>Emp. b o s s I d</ on>
<t o>Emp. empId</ t o>
• we present an efficient method to update such struc-
</ r e c u r s i v e −c o n d i t i o n>
tures in multithreaded environment;
<summands>
• we present a proof-of-concept implementation of en- <conc>Emp. empId</ conc>
hanced counting trees using Hibernate; <conc>Emp. sname</ conc>
</summands>
• we show performance experiments that attest robust- < f i l t e r s e c t i o n=” s e e d ”>
ness of the proposed solution. Emp. sname = $Param ( sname )
</ f i l t e r >
The paper is organized as follows. In Section 2 we ad- </ r c t e>
dress the related work. Section 3 describes the structure
of enhanced counting trees. Section 4 presents an exam-
ple application of this data structure. Section 5 explains
how enhanced counting trees can be used in applications. Listing 2: Java annotations used to define a recur-
Section 6 contains the presentation of the concept of ring sive query
updates. Section 7 depicts the update algorithm. Section 8
reports results of performance tests. Section 9 concludes. package sample . r e c u r s i v e . mapping ;
import o r g . ncu . h i b e r n a t e . a n n o t a t i o n s . ∗ ;
2. RELATED WORK @RecursiveQuery ( maxLevel = 4 )
The algorithms and data structures presented in this pa- @Tables ( name = ”Emp”)
per have been inspired by a number of efforts by other re- @ R e c u r s i v e C o n d i t i o n ( on = ”Emp. bossID ” , \
searchers. The idea of enhanced segment trees is based on t o = ”Emp. empId ”)
segments trees [5], i.e. standard data structures used e.g. in @Summands ( conc = { ”Emp. empId ” , \
computational geometry. Asynchronous updates and even- ”Emp. sname ” } )
tual consistency [6] are unavoidable if we require short re- @ F i l t e r ( s e e d = ”Emp. sname=$Param ( sname ) ”)
sponse times in multithreaded environment. public c l a s s S u b o r i d n a t e s {
Our solution is a kind of a materialized view. A recent @Column ( name = ”Emp. empID ”)
example of an implementation of such views are FlexViews public S t r i n g i d ;
[7] within MySQL based on the results described in [8, 9]. ...}
FlexViews rely on applying changes that have been written
to the change log. We do not want to rely on the log, since
our solution is a part of the ORM middleware that has no
access to database logs. We used additional storage that Table 1: Efficiency of approaches to recursion
imitates logs in the method of ring updates (see Section 6) Record Simple Single Ratio
that assure eventual consistency and robustness in multiuser count loop query
environments. 900 3033 ms 143 ms 4.71 %
The results of our earlier research that influences solu- 1800 8669 ms 266 ms 2.61 %
tions presented in this paper are presented in the following 2700 17550 ms 271 ms 1.54 %
subsections. 3500 28391 ms 405 ms 1.43 %
2.1 Recursive Queries in ORM 4500 41500 ms 414 ms 1.00 %
We developed methods to integrate recursive queries and
object-relational mapping systems [1, 2, 3]. We also imple- Dedicated generators compile such definitions and build
mented them in Hibernate and experimentally verified their a class that sends a single query to the underlying DBMS.
robustness. If in absence of such features a programmer Then, it creates objects based on the results. If the DBMS
is tasked to query recursive structures, he/she will create at hand performs recursive queries, generators will construct
a simple loop that retrieves children/neighbours of subse- them. We also took care for database systems whose SQL
quent visited nodes. Such a solution is unacceptably slow dialect has no recursion, e.g. MySQL. Then, the generated
due to multiple network round-trips and DBMS overhead: query is the unrolling of the desired query to certain pre-
parsing, optimisation and execution of multiple individual scribed level [10].
queries. Instead, we proposed intuitive interfaces based on Table 1 summarizes the performance experiments with the
XML files and Java annotations that allow performing the two discussed approaches. Its second and third column con-
whole recursive processing in one query. Listing 1 shows a tain the execution times of respectively the naı̈ve loop and
definition of a recursive query over the corporate hierarchy. the single query. The fourth column presents the ratio be-
Such a query can also be defined using Java annotations as tween the two. The gains of the single query approach are
presented as Listing 2. apparent.

123
2.2 Partial aggregation in ORM
Figure 2: A standard segment tree
We also researched extending ORM with materialization
for aggregate analytical queries [4]. Storing partial sums al-
lows optimizing aggregate queries that are remarkably time
consuming without additional data structures. Instead of
computing the answer from scratch, we can pre-aggregate
partial answers upfront and use them to build the result.
If we forecast the future workload well, the profits can be
measured in orders of magnitude. Dedicated triggers keep
the materialized values up to date. Stored aggregates are
thus always consistent.
As with recursive queries (see Section 2.1) most of this
solution is realized in ORM generators. Base data to aggre-
gate, functions to be used and the desired granularity are
defined as Java annotations. Listing 3 shows the class In- Figure 3: A “segment tree” but not binary and not
voiceLine with annotations stating that (1) quantities are to perfect
be aggregated, (2) the function to be used is the sum, (3) the
aggregation is per product and date of invoice subobject.

Listing 3: The class InvoiceLine with annotations

@Entity
public c l a s s I n v o i c e L i n e {
@DWDim(Dim = ”d a t e ”)
private I n v o i c e i n v o i c e ;
private Long i d ;
@DWDim
private Product p r o d u c t ;
@DWAgr( f u n c t i o n = ”SUM”) parts of the stored hierarchy, they have values, e.g. salaries
private I n t e g e r q u a n t i t y ; in the hierarchy of employees. A typical query computes
} the total salary of an employee and all his/her subordi-
Figure 1 shows the schema of the resulting database. It nates. Therefore, each node holds two numbers—its indi-
contains base tables customer, invoice and invoiceitem. vidual value and the derived sum of its value and all its
The other three tables from this Figure store derived data, descendants. Figure 4 presents an example of such a struc-
i.e. materializations of aggregated quantities. The table In- ture. In each node, the left number is its individual value,
voiceLine is augmented with triggers dispatched after in- while the right number is the sum of the subtree rooted in
sertions, updates and removals. They take care for consis- this node.
tency of the aggregated data. These tables and triggers are
created automatically by generators built into ORM. Appli-
Figure 4: Enhanced counting tree in the consistent
cation programmers have no access to these objects. Fur-
state
thermore, developers need not be aware of the existence of
such facilities. They just use queries and are pleased with
their efficiency.

3. EXTENDING SEGMENT TREES


A standard segment tree [5] is a static perfect binary tree,
i.e. all its leaves are on the same level and every node has
an even number of children. Leaves of such a structure store
base data, while internal nodes keep sums of the values of
their descendant leaves. Figure 2 shows an example of such
a tree.
Data structures used in this paper are based on the idea of
segment trees. Firstly, we drop the assumption that the trees Fourthly, since the proposed data structure is to be used in
are static. They can be reshaped by the application since a multiuser environment, we allow temporal inconsistencies.
they reflect real hierarchies describing stored business data. Sometimes, the derived sums may be wrong due to ongoing
Secondly, we assume that the trees can have any shape, i.e. updates down the subtree. We assume eventual consistency
they do not have to be binary and thus perfect. Figure 3 [6] as the only possible under such assumptions. It means
presents an example of such a tree. The tree is no longer that if applications stop modifying the tree, eventually all
balanced and binary (notice nodes 7 and 18 on the leftmost partial sums will become consistent. If we did not introduce
path). this relaxation, the proposed data structure would be infea-
Thirdly, in the proposed structures internal nodes hold sible due to deadlocks and lock contention especially at the
base values that are to be aggregated. Since they are natural root of the whole tree.

124
Figure 1: The database schema with base and derived tables

%&'()*+ *-./("+0
!23*1%&'
!&+0%&'
!+,#"
!',1"
!"#$ !,''."**
!"#$-%&'
!"#$%&' !23*1%&'
!()**%&'
!+,#" 12,)&'()*+,$)&+,3/4,54,16/+,#0(1-*/
!*,-,./
%&'()*+,$)&+ !$.)'%&'
!',1"
!&+0%&'
!*3#%41/
!-&+"%+.
!$.)'%&'
!41/
!$.&2"
#0(1-*/
!1,5 !$.)'%&'
!0,- !$%+,#"
!$.&2"

4. MOTIVATING EXAMPLE subtree and execute aggregation query over all his/her
Assume a company that has the multilevel marketing as sales, i.e. we retrieve invoices and invoice lines. This
its business model. Its employees form a hierarchy that is will mean a significant load on the database caused by
crucial during the computation of individual performance numerous heavy queries. This execution plan is way
and compensation. Each invoices is attributed to a single too costly.
employee who is eligible to gain parts of the corresponding Partial aggregation This scenario is similar in the num-
revenue. Higher level employees are also entitled to frac- ber of queries to be executed. For each employee down
tions of revenues of their subordinates. Figure 5 shows the the hierarchy of a given employee we send an aggregate
extracts from the database schema of our example company. query. This time auxiliary data structures noteworthy
accelerate the queries. However, the execution time
Figure 5: The original database schema for a com- seems to be still too long.
pany doing multilevel marketing Generation of recursive queries Listing 4 shows the re-
%&'()*+ cursive query generated for this problem. It contains
!&+0%&'
*-./("+0 two parts. The first part find the subordinates of the
!',1" !23*1%&' given employee, while the second part computes total
!"#$-%&' !+,#" sales for each individual employee. The construction
!"#$ of this query is thus clean. However, since all data
!23*1%&' !,''."**
!"#$%&' must be aggregated upfront, the query is inefficient.
!()**%&' Therefore, existing solutions are insufficient to run our
!+,#"
example queries.
!*,-,./
%&'()*+,$)&+
!&+0%&' #0(1-*/
!-&+"%+. Listing 4: Non-optimized recursive query for sales
!$.)'%&'
!$.)'%&' of an employee :empname and his/her subordinates
!$%+,#"
!41/ within the period :date1-:date2.
!$.&2"
!$.&2" WITH r c t e (
!1,5
SELECT emp id FROM empl
!0,-
WHERE name = : empname
UNION
We assume that the accounting department often executes SELECT e . emp id
the following two queries: (1) for an employee compute the FROM r c t e r JOIN emp e
total sales performed by him/her and all his/her direct and ON ( e . b o s s i d = r . emp id ) ) ,
indirect subordinates; (2) for an employee and period com- sales (
pute the total sales performed by him/her and all his/her SELECT i . emp id AS emp id ,
direct and indirect subordinates in the given time interval. sum ( i l . v a l u e ) AS sum value
Let us examine various methods to execute such queries FROM i n v o i c e i
through the object-relational mapping system Hibernate. JOIN i n v o i c e l i n e i l USING ( i n v i d )
We start from a plain system without any enhancements WHERE i . d a t e between : d a t e 1 and : d a t e 2
and then consider two optimized method published so far: GROUP BY i . emp id )
partial aggregation [4] and recursive query generation [2]. SELECT sum ( sum value )
FROM r c t e JOIN s a l e s USING ( emp id )
Plain Hibernate For a employee we traverse all his/her

125
5. ENHANCING SEGMENT TREES 0+*,1220,+"#$,31/+ !"#$ %&'()*+
As discussed in Section 4 contemporary solutions are not !"#$-%&' !"#$%&' !&+0%&'
enough to execute our example queries taken from a business !',1" !()**%&' !',1"
domain. We propose enhancing segment trees (see Section 3) !*3#%&+0)&2"%0,- !+,#" !"#$-%&'
!21%*3#%&+0)&2"%0,- !*,-,./ !23*1%&'
in order to cater for the efficiency of such queries.
Upon the configuration defined by a user either in an XML
file or annotations, Hibernate generators will create appro-
priate storage objects to hold materialized values. Listing 5 0+*,1220,+"#$
shows an example Java class with proposed annotations. For !"#$-%&'
such a class, ORM generators create additional objects pre- !*3#%*,-,./
!21%*3#%*,-,./
sented on Figure 6. They are tables reg_aggr_empl and !*3#%&+0)&2"%0,- %&'()*+,$)&+
reg_aggr_empl_date that store nodes of the appropriate en- !21%*3#%&+0)&2"%0,- !&+0%&'
hanced segment tree. !-&+"%+.
#0(3-*/ !$.)'%&'
*-./("+0
!41/
Listing 5: Java annotations that cause generation of !$.)'%&' !$.&2" !23*1%&'
an enhanced segment tree. !$%+,#" !1,5 !+,#"
!$.&2" !0,- !,''."**
@Entity
@SegmentTree ( p r i o r = ” b o s s i d ”)
public c l a s s Empl {
... Figure 6: Example database schema extended with
@SegmentTree (Dim = ”date , noDim ” , the storage for materialized aggregations.
TreeAggr = ”SUM” ,
a g g r=”SUM( v a l ) ) ”) Figure 7: Enhanced segment tree
private L i s t <I n v o i c e > i n v o i c e ;

@SegmentTree ( TreeAggr = ”SUM”)


private Currency s a l a r y ;
...

Listing 6: The segment tree query for sales of an


employee :empname and his/her subordinates.

SELECT c t s u m i n v o i c e v a l
FROM r e c a g g r e m p l
JOIN empl USING ( e m p l i d )
WHERE empl . name = : empname will suffer from significant lock contention and even dead-
locks, since every update will have to traverse the path to
the root. On this way it will try putting expensive transac-
tional locks. We address this problem in following Sections.
Listing 7: The segment tree query for sales of an
employee :empname and his/her subordinates within
the period :date1-:date2.
6. RING UPDATES
The value stored in the root of an enhanced segment tree
SELECT SUM( c t s u m i n v o i c e v a l ) depends on the values of its children. Therefore, any update
FROM r e c a g g r e m p l d a t e of the tree must be reflected in the root. A direct imple-
JOIN empl USING ( e m p l i d ) mentation of updates is thus impossible, since the root will
WHERE empl . name = : empname immediately become the bottleneck and the cause of dead-
AND date between : d a t e 1 and : d a t e 2 locks.
There is a number of possible solutions to the problem
An example of the actual segment tree is depicted on of updates. Apparently, each of them has to adopt asyn-
Figure 7. Each node contains the total sales of the cor- chronous updates. The first idea may be to adopt a thresh-
responding employee (sum_invoice_val) and the total sales old value of change. Each node remembers the old value
of his/her subtree (ct_sum_invoice_val). and estimated change. Only if the estimated change ex-
Whenever the application issues the query for total sales of ceeds the threshold, the change is propagated towards its
an employee’s hierarchy, the ORM middleware will address parent. Thresholds may vary depending on the level of a
the database system with queries like those from Listings 6 node. The closer to the root the node is, the higher is the
and 7. threshold. As a result we achieve lower update sensitivity
Apparently such queries are fast since they access materi- of higher nodes. Thus, although a node closer to the root is
alized aggregates. However, such a data structure has to be a subject to more changes (it has more descendants), it will
kept up to date. Incoming transactions that create, modify be less often actually updated due to the higher threshold.
and delete invoices must be reflected in the enhanced seg- The second idea is based on a change queue. If a node is
ment tree. A straightforward implementation of this feature inserted, modified or deleted, the information on this event

126
will be pushed into a queue and not propagated toward the
parent. Each item is eventually removed from the queue Table 2: Comparison of the two solutions, when the
and its value gets recomputed. If it is different than the initial database is empty.
stored value, it will be updated and the parent will be put Operation EST Plain Ratio
into the queue. We have decided to use the second solution, Insert 240 039 ms 44 610 ms 538.00 %
since the first one does not assure eventual consistency of Query 6 936 ms 4 068 200 ms 0.17 %
the data structure. Update 35 727 ms 3 966 ms 901.00 %
Delete 348 820 ms 3 619 ms 9 639.00 %
7. IMPLEMENTATION Total 631 522 ms 4 120 395 ms 15.33 %
Assume an application that uses ORM augmented with
the solution advocated in this paper. At the deployment of
the application, ORM middleware generates triggers for each Table 3: Comparison of the two solutions, when the
base table created for a class with the annotation @Segment- initial database has 200 000 records.
Table. The delete trigger enqueues the parent of the deleted Operation EST Plain Ratio
node (if exists). The update trigger and the insert trigger Insert 369 064 ms 46 387 ms 796.00 %
enqueue just the affected node. The same node may simul- Query 3 938 ms 16 726 200 ms 0.02 %
taneously occur several times in the queue. This happens Update 29 432 ms 5 998 ms 491.00 %
for frequently modified nodes. Delete 191 984 ms 3 375 ms 5 688.00 %
A separate process periodically checks the queue and if it Total 594 418 ms 16 781 960 ms 3.54 %
is not empty, the process pops the first node from the queue.
If this node is not existent, it will be ignored. This may hap-
pen, if the node has been updated and then removed before
the update event from the queue is processed. Otherwise, recover the consistency of materialized partial aggregates
the segment tree gets updated. The new aggregate value is stored in the enhanced segment tree. The run-time of this
computed using appropriate function (e.g. sum, min, max) process is included in the measured presented in the tables
on the values of the node’s direct children. If the new value is below.
different than the old value, the new aggregate will be stored Table 2 reports results for the scenario that starts with an
and the parent of the node (if exists) will be equeued. Fi- empty database and then the records are inserted, queried,
nally, all occurrences of the updated node are removed from updated and deleted. As we can see all operations but the
the queue, since there is no more to do for the node at hand. query are significantly slower for enhanced segment trees. It
This implementation is robust since even frequently mod- is not surprising since the plain solution just performs single
ified nodes do not cause heavy update propagation towards SQL statements, while EST also updates the materialized
the root. The delay in update transmission and the fusion aggregates. However, queries are extremely faster for EST,
of events significantly limit the number of operations at the since they consist in reading single database rows only. Ap-
root and other internal nodes. parently, the gains caused at querying remarkably outweigh
the cost inherent in EST updates. The total execution time
for both scenarios proves the strength of the proposed solu-
8. PERFORMANCE EVALUATION tion (EST). Note that in this scenario, the number of queries
In this Section we describe the result of our experiments in equal to the number of updates. This obviously favours
with the proposed solution. We use the database schema for the plain solution. In real applications the fraction of queries
companies doing multilevel marketing (see Figure 5). The in the whole workload is commonly higher. Under such as-
tests include typical operations in such companies, like (1) sumptions the advantage of EST will be notably superior.
insertions of new data, (2) updates of existing data (3) re- Table 3 shows the results of a similar experiment. How-
movals, and (4) retrieval of aggregates. ever, this time we started from the database with 200 000
In the presented tables we compare the proposed solution records and then performed the same operations as in the
(identified as EST for enhanced segment trees) against the first experiment. In the plain scenario there are no changes
architecture without any materialized aggregates (identified in the durations of data modifications. Nonetheless, the cost
as Plain). The plain variant is the only possible with con- of queries was quadrupled. Each user query had to process
temporary object-relational mapping systems. more data and had been executed as more SQL queries.
The test of insertions consisted in placing 100 000 new In case of EST, insert were more costly, but other opera-
records into an empty database. Records were placed in tions were cheaper. Especially removals amount to be faster,
random locations in the tree data structure. The run-time since it was more probable to pick randomly a node closer
necessary to draw locations to put/update/remove/query to the leave level. Since in the first test we had to delete all
records are not included in the results below. 10 simul- records, all levels were equally probable. The cost of queries
taneous clients were loading the system with insertions of remained the same, since retrievals in the EST variant are
records. Analogously, each of 10 clients concurrently issued SQL point queries. In this scenario, the benefit of EST is
1 000 analytical queries for aggregates. The queries con- even more evident.
cerned randomly picked nodes of the tree. Updates were Tables 4 and 5 contain another perspective for the two
also send by 10 clients. Each of them issued 1 000 random abovementioned experiments summarized in Tables 2 and 3.
records. Removals were generated by one client that deleted Tables 4 and 5 do not include the time needed to bring
100 000 records in random order. the aggregates materialized in EST up to date. Under such
Immediately, after the tests for inserts/updates/deletions scenario, the queries in the EST variant return non-exact
a separate update process were started. Its task was to results. The overheads of EST are caused by queuing nodes

127
[4] Gawarkiewicz, M., Wiśniewski, P.: Partial aggregation
Table 4: Comparison of the two solutions, when the using Hibernate. [11] 90–99
update of materialized aggregates is off and the ini-
[5] Bentley, J.L.: Solutions to Klee’s rectangle problems.
tial database is empty.
Unpublished manuscript, Dept of Comp Sci,
Operation EST Plain Ratio
Carnegie-Mellon University, Pittsburgh PA (1977)
Insert 50 210 ms 44 610 ms 113 % [6] Brewer, E.A.: Towards robust distributed systems
Update 5 621 ms 3 966 ms 142 % (abstract). In Neiger, G., ed.: PODC, ACM (2000) 7
Delete 14 634 ms 3 619 ms 404 % [7] Flexviews: Incrementally refreshable materialized
views for MySQL (2012)
Table 5: Comparison of the two solutions, when the [8] Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance
update of materialized aggregates is off and the ini- of data cubes and summary tables in a warehouse. In
tial database has 200 000 records. Peckham, J., ed.: SIGMOD Conference, ACM Press
Operation EST Plain Ratio (1997) 100–111
Insert 51 473 ms 46 387 ms 111 % [9] Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How
Update 3 954 ms 5 998 ms 66 % to roll a join: asynchronous incremental view
Delete 15 318 ms 3 375 ms 454 % maintenance. SIGMOD Rec. 29 (2000) 129–140
[10] Boniewicz, A., Stencel, K., Wiśniewski, P.: Unrolling
SQL:1999 recursive queries. In Kim, T.h., Ma, J.,
to be synchronized by the separate update process. These Fang, W.c., Zhang, Y., Cuzzocrea, A., eds.: Computer
two tables attest that the main time complexity lies in the Applications for Database, Education, and Ubiquitous
materialization of aggregates. Nevertheless, the experiments Computing. Volume 352 of Communications in
demonstrate that this overhead is worth its price. Computer and Information Science. Springer Berlin
Heidelberg (2012) 345–354
9. CONCLUSION [11] Kim, T.H., Adeli, H., Slezak, D., Sandnes, F.E., Song,
X., Chung, K.I., Arnett, K.P., eds.: Future Generation
In this paper we describe the next step in our quest to
Information Technology - Third International
enrich and optimize object relational mapping (ORM). This
Conference, FGIT 2011 in Conjunction with GDC
layer of middleware need not be treated as a threat. We ad-
2011, Jeju Island, Korea, December 8-10, 2011.
vocate that it should rather be perceived as an opportunity.
Proceedings. In Kim, T.H., Adeli, H., Slezak, D.,
Nowadays ORM are just data mappers. In our opinion
Sandnes, F.E., Song, X., Chung, K.I., Arnett, K.P.,
we can and we should enrich them with new functionality
eds.: FGIT. Volume 7105 of Lecture Notes in
and optimization features. Previously, we showed how to
Computer Science., Springer (2011)
add hierarchical queries and partial aggregations to ORM.
In this paper we considered more complex scenario of hierar-
chical data and analytical queries to such data. We showed
enhanced segment trees (EST) to store materialized partial
aggregates. EST significantly accelerate querying with ac-
ceptable overhead of updates. We also took multithreading
into account. Our solution does not suffer for synchroniza-
tion problems and deadlocks at the cost of sacrificing abso-
lute consistency for eventual consistency.
We also prepared proof-of-concept prototype implemen-
tation of EST for Hibernate and experimentally verified its
efficiency. We obtained results that attest the robustness of
EST. In the future, we plan to invent, design and develop
more improvements to the ORM layer.

10. REFERENCES
[1] Szumowska, A., Burzańska, M., Wiśniewski, P.,
Stencel, K.: Efficient implementation of recursive
queries in major object relational mapping systems.
[11] 78–89
[2] Wiśniewski, P., Szumowska, A., Burzańska, M.,
Boniewicz, A.: Hibernate the recursive queries -
defining the recursive queries using Hibernate ORM.
In Eder, J., Bieliková, M., Tjoa, A.M., eds.: ADBIS
(2). Volume 789 of CEUR Workshop Proceedings.,
CEUR-WS.org (2011) 190–199
[3] Szumowska, A., Burzańska, M., Wiśniewski, P.,
Stencel, K.: Extending HQL with plain recursive
facilities. In Morzy, T., Härder, T., Wrembel, R., eds.:
ADBIS (2). Volume 186 of Advances in Intelligent
Systems and Computing., Springer (2012) 265–272

128

View publication stats

You might also like