You are on page 1of 25

D1 Solutions AG

a Netcetera Company

Real Life Performance of


In-Memory Database Systems for BI
10th European TDWI Conference
Munich, June 2010

10th European TDWI Conference


Munich, June 2010
Authors:

Dr. Andreas Hauenstein

Dr. Simon Hefti

Dr. Andrej Vckovski

In-Memory Database Systems

Buzzwords: Column-Orientation, In-Memory, Shared Nothing

Meaning: Looks like Oracle/DB2/SQLServer from the outside,


just much faster

We are talking about relational systems, queryable in SQL

We are not talking about client side caching


(Microstrategy or QlikView do this)

There is a new generation of DB systems, for example MonetDB, Exasol, Greenplum,


LucidDB

Business Intelligence Data Warehouse

We are not looking at transactional systems

Any DB of an online shop or any DB driving a web site is transactional

Typically BI applications are driven by a non-transactional data store that is bulk


loaded in intervals by an ETL process. This is called a data warehouse.

Next generation DB systems also exist for transactional systems. An example is


Oracle TimesTen. This is a different subject.

DB Systems Spezialized for Transactions


(e.g. TimesTen)

DB Systems Specialized for Analytics


(e.g. Teradata)

General Purpose DB Systems (e.g. Oracle, SQL Server)

Business Intelligence Generated SQL

Tools with a GUI that generate SQL statements

Examples: Business Objects, OBIEE, Microstrategy, Cognos

No SQL tuning possible

Bad SQL

Non-technical users

Frequently changing queries

Lots of averages and sums, groupings, consolidation

Real Life Problem (1)

Consolidation of numbers along a hierarchy

Use a Parent-Child Table with a bridge table to do this in a relational DB

Real Life Problem (2)

Every company has this sort of problem

The most important people (CEO) experience the worst performance

OLAP tools exist because this sort of query is traditionally slow on relational systems

At a customer, 6 GB of data resulted in a 20 minute wait


for the CEO

Even Pre-Calculating all reports over night became difficult

The Data Model

Bridge Table
400 K
Rows

8191 nodes
12 levels
4096 leaves

500 K
Rows

300 K
Rows

Size of the Data


Blocks
DIM_ACCOUNTING
DIM_BUSINESSTYPE
DIM_CLIENT
DIM_MEASURE
DIM _ORG
DIM_ORG_FLAT
DIM_PRODUCT
DIM_TIME
DIM_TRANS
DIM_UNIT
T_FACTS

Rows
9'780
10

532067
181

29819

453392

81

123
118

8916
53248

11875

344380

11
77

501
3001

5
723739

81
16019518

775561

17415366

Quite small data volume

Bad performance on several platforms

Realistic scenario

775561 blocks * 8192 Bytes = 6 GB

Data Generation
create_dim(
p_bf
=> 2,
p_depth => 12,
p_name => 'org',
p_cols => 'org01,org02,org03,org04,org05,org06,org07,org08,org09,org10',
p_types => 't10,t10,t10,t10,t10,t10,t10,t10,t10,t10
);

One function call creates complete dimension table dim_org

Generates id column, parent pointer, bridge table dim_org_flat

Generated from a helper table with just integers and random numbers

Similar function to generate fact table

Started out as PL/SQL, now a Perl script that works with any DB

It is easy to model any scenario with this tool

The Test Query

Generated by BI tool

Initial Tests on Oracle and SQL Server


Aggregated Fact Rows
Machine

OS

DBMS

IBM 9117-570 8 GB RAM 1.9


GHt 4 CPUs

AIX

Oracle 10G

1200 sec

168 sec

167 sec Expensive Production


Server

Dell Dimension E521 4GB


RAM

Windows 2003 Server

Oracle 10 G

1023 sec

205 sec

159 sec Home PC

Dell Dimension E521 4 GB


RAM

Windows 2003 Server

MS SQL Server

741 sec

699 sec

293 sec

1432 sec

413 sec

386 sec Linux with little RAM

HP DL 380 Proliant Server 0.5 Red Hat Linux


GB RAM Intel Xeon 3.2 GHz

16 Mio

1 Mio

3500

Description

2005
Oracle 10 G

All the same order of magnitude

Adding RAM does not help a traditional DB

PCs are better than you think

A New Generation DB System


Aggregated Fact Rows
Machine
IBM 9117-570 8 GB RAM 1.9
GHt 4 CPUs

OS
AIX

DBMS
Oracle 10G

Dell Dimension E521 4GB


RAM

Windows 2003 Server

Oracle 10 G

Dell Dimension E521 4 GB


RAM

Windows 2003 Server

MS SQL Server

HP DL 380 Proliant Server 0.5 Red Hat Linux


GB RAM Intel Xeon 3.2 GHz
Exasol Test System 2 Quad
Core Intel CPU 32 GB RAM 2
nodes

Exacluster

2005
Oracle 10 G

16 Mio
1 Mio
3500
Description
1200 sec
168 sec
167 sec Expensive Production Server
1023 sec

205 sec

159 sec Home PC

741 sec

699 sec

293 sec

1432 sec

413 sec

386 sec Linux with little RAM

22 sec

2 sec

Exasol

(Linux Microkernel)

Im memory DB factor 30-50 faster

Thats the speed of sound relative to a bicycle

With generic Intel hardware

Worth looking at several of these new systems

0 sec In Memory DB

A New Generation DB System


1600
1400
1200
1000
800
600
400
200
0
DD SQL

DD CRA

HP

IBM

Exa

Im memory DB factor 30-50 faster

Thats the speed of sound relative to a bicycle

With generic Intel hardware

Worth looking at several of these new systems

The Contenders

Oracle 11 G

MySQL

MonetDB

LucidDB

Greenplum (their own hardware)

Exasol

(their own hardware)

The Test Server

Intel Dual Xeon E 5205

16 GB RAM

2 x 250 GB SATA Disk

64 Bit Debian Linux

Interesting DB Systems That Were Not Tested

Teradata

Oracle ExaData

Netezza

Vertica

Infobright

Kognitio

The field is very active and new products and approaches keep entering the market.

MonetDB

Origin:

Result of research at CWI in the Netherlands

Open Source:

Yes

Free of Charge:

Yes

Remarks:
o
Recent publicity through a paper in Communications of the ACM:
Breaking the Memory Wall in MonetDB
o
Constantly changing as research progresses
o
Easy to get into direct contact with the developers
Quote from the website:
MonetDB is a open-source database system for high-performance
Applications in data mining, OLAP, GIS, XMLQuery, text and multimedia
retrieval.

LucidDB

Origin:

Formerly part of LucidEra in San Mateo, California

Open Source:

Yes

Free of Charge:

Yes

Remarks:
o
Emphasizes ease of configuration and maintenance
o
Mostly written in Java

Quote from the website:


LucidDB is the first and only open-source RDBMS purpose-built entirely for
data warehousing and business intelligence. It is based on architectural
cornerstones such as column-store, bitmap indexing, hash join/aggregation,
and page-level multiversioning.

Greenplum

Origin:

Located in San Mateo, California. Postgres based.

Open Source:

Based on Open Source Technology

Free of Charge:

No

Remarks:
o
Based on similiar hardware architecture as Exasol
o
Highly configurable and tunable, lots of features
o
Column store is an option, default is row store

Quote from the website:


Greenplum Database utilizes a shared-nothing MPP (massively parallel
processing) architecture that has been designed from the ground up for BI
and analytical processing using commodity hardware. In this architecture,
data is automatically partitioned across multiple 'segment' servers, and
each 'segment' owns and manages a distinct portion of the overall data.
All communication is via a network interconnect -- there is no disk-level
sharing or contention to be concerned with (i.e. it is a 'shared-nothing
architecture).

Exasol

Origin:

Developed from scratch in Nrnberg, Germany

Open Source:

No

Free of Charge:

No

Remarks:
o
Based on similiar hardware architecture as Greenplum
o
Pure column store DB
o
Emphasizes ease of administration
o
No need to create indexes or gather statistics
o
Imitates some Oracle-isms for compatibility
Quote from the website:
The database has been specially developed for analysis and is being used
successfully for data warehousing, Web analytics, data mining applications
and more. In contrast with universal databases, this specialization means that the
data to be analyzed can be made available to analysis tools virtually in real time.

Typical Shared Nothing Node

Combine many of these, connected by GB Ethernet

Results With 16 Mio Rows in the Fact Table


2500

22 8 0

2000

1500

1000

500

4 60
22 6
31

13

10

MonetDB

Greenplum

Exasol

0
Oracle

MySQL

LucidDB

Oracle on a new 64 Bit box is 4 times faster than on an average 32 bit box

Both Oracle and LucidDB were twice as fast after dropping all indexes on the fact
table (those are the times in the chart)

We did not manage to tune MySQL to get acceptable performance for a free system,
LucidDB has good performance and little hassle

MonetDB needed a fix in the optimizer before coping with the query

Next generation in memory DBs are at least one order of magnitude faster

Performance Scaling
400
364
350

300
288
250
210

200

150

Greenplum[sec]

133
105

50

Exasol [sec]
(untuned comparable
hardware)
Exasol [sec]
(local dimensions
comparable hardware )

183

100

Exasol [sec]
(public demo system)

97

54

26
13
6
3

0
16

160

320

Both systems scale linearly


It is possible to query at least ten times the data
volume efficiently
The vendors claim unlimited linear scaling by adding
commodity hardware

Conclusion
Big Lessons

Database technology is in upheaval at the moment

By adopting the new technologies, you can totally revolutionize the way you access
your data

Prices will fall rapidly. This is like the PC revolution.

Small Lessons

If you have an Oracle on a 32 Bit system, move to a 64 Bit architecture. It will give
you a factor 4 without any pain

If your table scans are slow, drop all indexes

If you move to a new technology, you will get a factor 50

The commercial systems are worth their money. Their SQL is more compatible, and
they are more stable

You might also like