You are on page 1of 32

Oracle In-Memory - Game Changer in

Data Warehousing and Business Intelligence


Dr.-Ing. Holger Friedrich

Agenda

Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions

2014 sumIT AG

03/2012

sumIT AG
Consulting and implementation services in Switzerland
Experts for
Data Warehousing and
Business Intelligence solutions

Focussed on Oracle technology


BI Foundation specialized partner
Data Warehousing specialized partner
Exalytics competence center with own server
Our motto: Get Value From Data
Visit our web site: www.sumit.ch
(in German)

2013 sumIT AG

03/2012

Holger Friedrich
Computer Science diploma of
Karlsruhe Institute of Technology (KIT)
Ph.D. in Robotics and Machine Learning
More than 16 years experience with Oracle technology
Expert for

Data Integration
Data Warehousing,
Data Mining and
Business Intelligence

Technical Director of sumIT AG


!

First Oracle ACE for DWH/BI in Switzerland

2013 sumIT AG

03/2012

Agenda

Introduction
Columnar Databases
Oracle In-Memory
Analytics
Loading
Conclusions

2014 sumIT AG

03/2012

Advantages
Best for queries that
- scan large quantities of data
- on a rather small set of columns
- compute aggregates on the
results

High compression benefits on


most columns
(except ones containing distinct
values)
Well suited for OLAP/BI

2014 sumIT AG

03/2012

Drawbacks (Up To Now)


Some operations very costly
- DML
- Queries retrieving entire rows
!
!

Less suited for OLTP

Complex DBMS infrastructure has to


be build once more
- storage (management)
- security
- clustering
- disaster recovery
-
2014 sumIT AG

03/2012

Competition
Niche vendors
-

Exasol
HP Vertica
Infobright
Paracell
!

The usual suspects


-

Microsoft (Columnstore Indexes)


IBM
Teradata
and of course SAP/HANA

2014 sumIT AG

03/2012

Agenda

Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions

2014 sumIT AG

03/2012

Columnar Stores/DBs - Oracles Flavour


transparent column store managed next to the row store
not either/or
persistent storage row-based as before
column store DML-synched in real-time
the entire Oracle DB-ecosphere remains unchanged
- security
- backup
- disaster recovery
- RAC
-
NO application changes required!
2014 sumIT AG

03/2012

10

Technology Gems
1. In-memory storage index
2. Filtering on binary compressed data
3. Columnar storage of selected columns
4.
5.
6.
7.
8.
9.

Transparent querying across storage hierarchy


Real-time background actualization of columnar store
Parallel query execution on the columnar store
SIMD vector processing
In-memory fault tolerance on RAC
On-demand building of multi-dimensional aggregation
data structure
(almost an on-the-fly MOLAP cube)

2014 sumIT AG

03/2012

11

In-Memory Storage Index


Column data ist stored separated in compression units (IMCUs)
In-Memory Storage Indexes store Min/Max values for each
column for each IMCU
Example: Find sales from stores with a store_id of 8 or higher
IMCUs with Min/Max outside
Memory
a query predicate can be
Min 1
Max 3
safely ignored during
processing
Min 4
Max 7
v$mystat shows information
about number of IMCUs
Min 8
assessed vs. IMCUs pruned
Max 12
SALES
Column
Format
2014 sumIT AG

03/2012

Min 13
Max 15

12

SIMD Vector Processing

2014 sumIT AG

Memory
Example:
Find all sales
With PROMO_ID 9999

PROMO_ID

CPU
9999

Load
multiple
PROMO_ID
values

03/2012

VECTOR
REGISTER

Single Instruction
processing Multiple Data
values
Evaluation of a set of
column values in a single
CPU instruction cycle
Potential to speed up
processing to billions of
rows per second

9999
9999

Vector
Compare
all values
in 1 cycle

9999

13

In-Memory Aggregation

New optimizer transformation Vector Group By


Resembles well-known star transformation
Two phase, 6 step process
Phase 1 - preparation
1.
2.
3.
4.

Scan dimensions
Build key vectors
Prepare accumulator
Build tmp-tables for
dim select attributes
Phase 2 - computation
5. Scan facts w.r.t.
key vectors
6. Join filtered facts with tmp-tables
2014 sumIT AG

03/2012

14

In-Memory Aggregation - XPLAN

2014 sumIT AG

03/2012

15

In-Memory on RAC Including Fault Tolerance


Distribution of large objects
in-memory compression units (IMCUs)
automatically (default)
BY ROWID RANGE
BY {SUB}PARTITION

Fault tolerance
(engineered systems only)
DISTRIBUTE clause to keep
redundant IMCU copies on nodes
DISTRIBUTE ALL = each IMCU
copied to every node
2014 sumIT AG

03/2012

16

Assessment

The In-Memory-Option can extremely improve query performance


In particular data scanning is benefiting
Joins & Vector-By aggregations are accelerated as well
However, it is advanced technology not magic
Sorting, classic aggregation etc. still take time
Row Store
In-Memory

Scan Data
Scan
Data

Row Store
In-Memory

Aggregate

Scan Data
Scan
Data

Aggregate

Join / Sort / Group /

Join / Sort / Group /

Aggregate

Aggregate

t
2014 sumIT AG

03/2012

17

Agenda

Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions

2014 sumIT AG

03/2012

18

Unprecedented Performance for


Reporting queries
- Simple
- SQL*Analytics

(Tool based) OLAP


Dimensional queries

2014 sumIT AG

03/2012

19

Simple Reporting Queries


Query characteristics
-

few joins
simple one-step aggregations (if at all)
lots of filtering
sometimes many rows and to be displayed

Processing
-

scanning in columnar store use IMCU storage indexes


join by bloom filtering applied on columnar store
scanning and joining effort far outweighs other processing effort
but large number of rows may need time to transfer to client
SIMD computation can be used on a large scale

In-Memory impact
high performance gains
2014 sumIT AG

03/2012

20

SQL*Analytics Reporting Queries


Query characteristics
-

some joins
complex analytic functions
lots of filtering
often many attributes to be displayed

Processing
-

scanning in columnar store use IMCU storage indexes


join by bloom filtering applied on columnar store
share of processing effort other than scanning and joining rises
SIMD computation can be used

In-Memory impact
gain of performance, but smaller than for more simple reporting queries
2014 sumIT AG

03/2012

21

(Tool Based) OLAP


Query characteristics
horrendously complex queries
chaining of with clauses
complex analytic functions and aggregations

Processing
-

short scanning time


hard for optimizer to find efficient plan
materialization of temporary results breaks' pure columnar processing
intermediate computation effort exceeds columnar in-memory share of effort

In-Memory impact
gain of using in-memory option depends on query complexity
the need for pre-computing (some) aggregates remains
2014 sumIT AG

03/2012

22

Dimensional Queries
Characteristics
-

few simple joins (star shape)


filtering on dimensions
most aggregations along dimension attributes
massive amount of facts
sometimes massive dimensions

Technology & consequences


- short scanning time
- application of optimizer's new vector-group-by transformation

In-Memory impact
high performance gain

2014 sumIT AG

03/2012

23

Different Reporting Queries

2014 sumIT AG

03/2012

24

Acceleration Of Reporting Queries

report type

no of rows

result set row store (SGA) columnar store

times X

simple

400K

35

(SGA)
10ms

2ms

join

14M & 55K

2M

25s

25s

(bloom)
join, top10
(analytics)

14M & 55K

10

2s

1s

dimensional
(vector by)

14M & 1.8K & 72

88

8s

0.8s

10

Demo comparing SGA row based vs. in-memory columnar store


Small Virtual Machine
No SIMD support in demo environment
Serial execution
Higher gains on enterprise infrastructure

2014 sumIT AG

03/2012

25

Agenda

Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions

2014 sumIT AG

03/2012

26

Data Quality & Consistency Assessment


Typical tests
-

column value checks


intra row checks
inter row checks
inter table checks

Challenge
-

often complex conditions


functions have to be applied
costly, also in columnar store
e.g. not REGEXP_LIKE (ssnumber, \d{3}\.\d{4}\.\d{4}\.\d{2})

Observation
- gain depends significantly on test complexity
2014 sumIT AG

03/2012

27

Meta Data Transformation During ETL


Typical scenario

gender entries for all rows

gender

DWH gender

Challenge

mapping table

src system

gender

- transformation of source dependent (domain) data into DWH


standard representation
staged src
- usually using mapping tables
- e.g. sourcesys=SAP and
src system is
JD Edwards
gender = 0'
=> return male'
- typical case of joins without aggregation
- staging tables initially not in column store

Strategy
- populate only columns to be transformed into column store
- check population time vs. speed gain
2014 sumIT AG

03/2012

28

Key Transformations
Typical scenario
- transformation of source dependent natural/business keys into DWH owned
surrogate representation
- reverse lookups for data mart loading
- multiple (outer) joins against target tables
- typical case of (outer) joins without aggregation

Challenge
- staging tables initially not in column store

Strategy
- populate only rows to be transformed into column store
- check population time vs. speed gain
- works also with lookup tables in columnar and staging table in row format
2014 sumIT AG

03/2012

29

Example Key Lookup Query


4. return DWH-IDs
plus some other stuff
select
s.invoicenumber, s.year, s.audit_id,
r.id invoice_id, m.id member_id
from (select * from st_db_rechnung_in_t
db_rechnung_ht r,
pv_mitglied_ht m
where s.invoicenumber = m.invoicenumber
and s.invoicenumber = r.invoicenumber
and s.year = r.year
and s.incoiceitem = r.incoiceitem
and s.srcmodifieddt > SYSDATE-720

s.cutoffdt,
where rownum < 100000) s,
(+)
(+)
(+)
(+)

1. scan staging table


3. outer join to lookup
tables
2. take last 2 years

2014 sumIT AG

03/2012

30

Chaining Of Bloom Filters


-------------------------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
| 99999 | 9081K|
| 2848
(1)| 00:00:01 |
|* 1 | HASH JOIN OUTER
|
| 99999 | 9081K| 8008K| 2848
(1)| 00:00:01 |
|
2 |
JOIN FILTER CREATE
| :BF0000
| 99999 | 6835K|
| 1090
(1)| 00:00:01 |
|* 3 |
HASH JOIN OUTER
|
| 99999 | 6835K| 6840K| 1090
(1)| 00:00:01 |
|
4 |
JOIN FILTER CREATE
| :BF0001
| 99999 | 5664K|
|
278
(1)| 00:00:01 |
|* 5 |
VIEW
|
| 99999 | 5664K|
|
278
(1)| 00:00:01 |
|* 6 |
COUNT STOPKEY
|
|
|
|
|
|
|
|
7 |
TABLE ACCESS FULL
| ST_DB_RECHNUNG_IN_T | 99999 | 3613K|
|
278
(1)| 00:00:01 |
|
8 |
JOIN FILTER USE
| :BF0001
|
395K| 4637K|
|
29
(7)| 00:00:01 |
|* 9 |
TABLE ACCESS INMEMORY FULL| PV_MITGLIED_HT
|
395K| 4637K|
|
29
(7)| 00:00:01 |
| 10 |
JOIN FILTER USE
| :BF0000
|
781K|
17M|
|
73 (13)| 00:00:01 |
|* 11 |
TABLE ACCESS INMEMORY FULL | DB_RECHNUNG_HT
|
781K|
17M|
|
73 (13)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

4. hash-join bloom
filter false positives

2. create Bloom filters


on lookup tables

1. scan staging table

3. apply Bloom filters


2014 sumIT AG

03/2012

31

Conclusions
Oracle In-Memory is a game changer on the DWH/BI market
in contrary to niche players it is absolutely enterprise ready
in contrary to the other big players its use requires no modifications

Therefore, In-Memory provides a big leap in performance with


- low risks
- low project-, infrastructure-, maintenance- & development cost

However, In-Memory is no silver bullet


Speed-up varies very much on query complexity
Good design of ETL processes & analyses remains important
Powerful infrastructure is still required
(think about using Oracle Engineered Systems)

2014 sumIT AG

03/2012

32

You might also like