CON5193 Oracle In-Memory The Game Changer in Data Warehousing and Business Intelligence

Oracle In-Memory - Game Changer in
Data Warehousing and Business Intelligence

Dr.-Ing. Holger Friedrich
Agenda
Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions
2014 sumIT AG
03/2012
sumIT AG
Consulting and implementation services in Switzerland
Experts for
Data Warehousing and
Business Intelligence solutions
Focussed on Oracle technology

BI Foundation specialized partner
Data Warehousing specialized partner
Exalytics competence center with own server
Our motto: Get Value From Data
Visit our web site: www.sumit.ch
(in German)
2013 sumIT AG
03/2012
Holger Friedrich
Computer Science diploma of
Karlsruhe Institute of Technology (KIT)
Ph.D. in Robotics and Machine Learning
More than 16 years experience with Oracle technology
Expert for
Data Integration
Data Warehousing,
Data Mining and
Business Intelligence
Technical Director of sumIT AG

!
First Oracle ACE for DWH/BI in Switzerland
2013 sumIT AG
03/2012
Agenda
Introduction
Columnar Databases
Oracle In-Memory
Analytics
Loading
Conclusions
2014 sumIT AG
03/2012
Advantages
Best for queries that
- scan large quantities of data
- on a rather small set of columns
- compute aggregates on the
results
High compression benefits on

most columns
(except ones containing distinct
values)
Well suited for OLAP/BI
2014 sumIT AG
03/2012
Drawbacks (Up To Now)

Some operations very costly
- DML
- Queries retrieving entire rows
!
!
Less suited for OLTP
Complex DBMS infrastructure has to

be build once more
- storage (management)
- security
- clustering
- disaster recovery
-
2014 sumIT AG
03/2012
Competition
Niche vendors
-
Exasol
HP Vertica
Infobright
Paracell
!
The usual suspects

-
Microsoft (Columnstore Indexes)

IBM
Teradata
and of course SAP/HANA
2014 sumIT AG
03/2012
Agenda
Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions
2014 sumIT AG
03/2012
Columnar Stores/DBs - Oracles Flavour

transparent column store managed next to the row store
not either/or
persistent storage row-based as before
column store DML-synched in real-time
the entire Oracle DB-ecosphere remains unchanged
- security
- backup
- disaster recovery
- RAC
-
NO application changes required!
2014 sumIT AG
03/2012
10
Technology Gems
1. In-memory storage index
2. Filtering on binary compressed data
3. Columnar storage of selected columns
4.
5.
6.
7.
8.
9.
Transparent querying across storage hierarchy

Real-time background actualization of columnar store
Parallel query execution on the columnar store
SIMD vector processing
In-memory fault tolerance on RAC
On-demand building of multi-dimensional aggregation
data structure
(almost an on-the-fly MOLAP cube)
2014 sumIT AG
03/2012
11
In-Memory Storage Index

Column data ist stored separated in compression units (IMCUs)
In-Memory Storage Indexes store Min/Max values for each
column for each IMCU
Example: Find sales from stores with a store_id of 8 or higher
IMCUs with Min/Max outside
Memory
a query predicate can be
Min 1
Max 3
safely ignored during
processing
Min 4
Max 7
v$mystat shows information
about number of IMCUs
Min 8
assessed vs. IMCUs pruned
Max 12
SALES
Column
Format
2014 sumIT AG
03/2012
Min 13
Max 15
12
SIMD Vector Processing
2014 sumIT AG
Memory
Example:
Find all sales
With PROMO_ID 9999
PROMO_ID
CPU
9999
Load
multiple
PROMO_ID
values
03/2012
VECTOR
REGISTER
Single Instruction
processing Multiple Data
values
Evaluation of a set of
column values in a single
CPU instruction cycle
Potential to speed up
processing to billions of
rows per second
9999
9999
Vector
Compare
all values
in 1 cycle
9999
13
In-Memory Aggregation
New optimizer transformation Vector Group By

Resembles well-known star transformation
Two phase, 6 step process
Phase 1 - preparation
1.
2.
3.
4.
Scan dimensions
Build key vectors
Prepare accumulator
Build tmp-tables for
dim select attributes
Phase 2 - computation
5. Scan facts w.r.t.
key vectors
6. Join filtered facts with tmp-tables
2014 sumIT AG
03/2012
14
In-Memory Aggregation - XPLAN
2014 sumIT AG
03/2012
15
In-Memory on RAC Including Fault Tolerance

Distribution of large objects
in-memory compression units (IMCUs)
automatically (default)
BY ROWID RANGE
BY {SUB}PARTITION
Fault tolerance
(engineered systems only)
DISTRIBUTE clause to keep
redundant IMCU copies on nodes
DISTRIBUTE ALL = each IMCU
copied to every node
2014 sumIT AG
03/2012
16
Assessment
The In-Memory-Option can extremely improve query performance

In particular data scanning is benefiting
Joins & Vector-By aggregations are accelerated as well
However, it is advanced technology not magic
Sorting, classic aggregation etc. still take time
Row Store
In-Memory
Scan Data
Scan
Data
Row Store
In-Memory
Aggregate
Scan Data
Scan
Data
Aggregate
Join / Sort / Group /
Join / Sort / Group /
Aggregate
Aggregate
t
2014 sumIT AG
03/2012
17
Agenda
Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions
2014 sumIT AG
03/2012
18
Unprecedented Performance for

Reporting queries
- Simple
- SQL*Analytics
(Tool based) OLAP

Dimensional queries
2014 sumIT AG
03/2012
19
Simple Reporting Queries

Query characteristics
-
few joins
simple one-step aggregations (if at all)
lots of filtering
sometimes many rows and to be displayed
Processing
-
scanning in columnar store use IMCU storage indexes

join by bloom filtering applied on columnar store
scanning and joining effort far outweighs other processing effort
but large number of rows may need time to transfer to client
SIMD computation can be used on a large scale
In-Memory impact
high performance gains
2014 sumIT AG
03/2012
20
SQL*Analytics Reporting Queries

-
some joins
complex analytic functions
lots of filtering
often many attributes to be displayed
Processing
-
scanning in columnar store use IMCU storage indexes

join by bloom filtering applied on columnar store
share of processing effort other than scanning and joining rises
SIMD computation can be used
In-Memory impact
gain of performance, but smaller than for more simple reporting queries
2014 sumIT AG
03/2012
21
(Tool Based) OLAP

horrendously complex queries
chaining of with clauses
complex analytic functions and aggregations
Processing
-
short scanning time

hard for optimizer to find efficient plan
materialization of temporary results breaks' pure columnar processing
intermediate computation effort exceeds columnar in-memory share of effort
In-Memory impact
gain of using in-memory option depends on query complexity
the need for pre-computing (some) aggregates remains
2014 sumIT AG
03/2012
22
Dimensional Queries
Characteristics
-
few simple joins (star shape)

filtering on dimensions
most aggregations along dimension attributes
massive amount of facts
sometimes massive dimensions
Technology & consequences

- short scanning time
- application of optimizer's new vector-group-by transformation
In-Memory impact
high performance gain
2014 sumIT AG
03/2012
23
Different Reporting Queries
2014 sumIT AG
03/2012
24
Acceleration Of Reporting Queries
report type
no of rows
result set row store (SGA) columnar store
times X
simple
400K
35
(SGA)
10ms
2ms
join
14M & 55K
2M
25s
25s
(bloom)
join, top10
(analytics)
14M & 55K
10
2s
1s
dimensional
(vector by)
14M & 1.8K & 72
88
8s
0.8s
10
Demo comparing SGA row based vs. in-memory columnar store

Small Virtual Machine
No SIMD support in demo environment
Serial execution
Higher gains on enterprise infrastructure
2014 sumIT AG
03/2012
25
Agenda
Introduction
Columnar Stores
Oracle In-Memory
Analytics
Loading
Conclusions
2014 sumIT AG
03/2012
26
Data Quality & Consistency Assessment

Typical tests
-
column value checks

intra row checks
inter row checks
inter table checks
Challenge
-
often complex conditions

functions have to be applied
costly, also in columnar store
e.g. not REGEXP_LIKE (ssnumber, \d{3}\.\d{4}\.\d{4}\.\d{2})
Observation
- gain depends significantly on test complexity
2014 sumIT AG
03/2012
27
Meta Data Transformation During ETL

Typical scenario
gender entries for all rows
gender
DWH gender
Challenge
mapping table
src system
gender
- transformation of source dependent (domain) data into DWH

standard representation
staged src
- usually using mapping tables
- e.g. sourcesys=SAP and
src system is
JD Edwards
gender = 0'
=> return male'
- typical case of joins without aggregation
- staging tables initially not in column store
Strategy
- populate only columns to be transformed into column store
- check population time vs. speed gain
2014 sumIT AG
03/2012
28
Key Transformations
Typical scenario
- transformation of source dependent natural/business keys into DWH owned
surrogate representation
- reverse lookups for data mart loading
- multiple (outer) joins against target tables
- typical case of (outer) joins without aggregation
Challenge
- staging tables initially not in column store
Strategy
- populate only rows to be transformed into column store
- check population time vs. speed gain
- works also with lookup tables in columnar and staging table in row format
2014 sumIT AG
03/2012
29
Example Key Lookup Query

4. return DWH-IDs
plus some other stuff
select
s.invoicenumber, s.year, s.audit_id,
r.id invoice_id, m.id member_id
from (select * from st_db_rechnung_in_t
db_rechnung_ht r,
pv_mitglied_ht m
where s.invoicenumber = m.invoicenumber
and s.invoicenumber = r.invoicenumber
and s.year = r.year
and s.incoiceitem = r.incoiceitem
and s.srcmodifieddt > SYSDATE-720
s.cutoffdt,
where rownum < 100000) s,
(+)
(+)
(+)
(+)
1. scan staging table

3. outer join to lookup
tables
2. take last 2 years
2014 sumIT AG
03/2012
30
Chaining Of Bloom Filters

-------------------------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
| 99999 | 9081K|
| 2848
(1)| 00:00:01 |
|* 1 | HASH JOIN OUTER
|
| 99999 | 9081K| 8008K| 2848
(1)| 00:00:01 |
|
2 |
JOIN FILTER CREATE
| :BF0000
| 99999 | 6835K|
| 1090
(1)| 00:00:01 |
|* 3 |
HASH JOIN OUTER
|
| 99999 | 6835K| 6840K| 1090
(1)| 00:00:01 |
|
4 |
JOIN FILTER CREATE
| :BF0001
| 99999 | 5664K|
|
278
(1)| 00:00:01 |
|* 5 |
VIEW
|
| 99999 | 5664K|
|
278
(1)| 00:00:01 |
|* 6 |
COUNT STOPKEY
|
|
|
|
|
|
|
|
7 |
TABLE ACCESS FULL
| ST_DB_RECHNUNG_IN_T | 99999 | 3613K|
|
278
(1)| 00:00:01 |
|
8 |
JOIN FILTER USE
| :BF0001
|
395K| 4637K|
|
29
(7)| 00:00:01 |
|* 9 |
TABLE ACCESS INMEMORY FULL| PV_MITGLIED_HT
|
395K| 4637K|
|
29
(7)| 00:00:01 |
| 10 |
JOIN FILTER USE
| :BF0000
|
781K|
17M|
|
73 (13)| 00:00:01 |
|* 11 |
TABLE ACCESS INMEMORY FULL | DB_RECHNUNG_HT
|
781K|
17M|
|
73 (13)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
4. hash-join bloom
filter false positives
2. create Bloom filters

on lookup tables
1. scan staging table
3. apply Bloom filters

2014 sumIT AG
03/2012
31
Conclusions
Oracle In-Memory is a game changer on the DWH/BI market
in contrary to niche players it is absolutely enterprise ready
in contrary to the other big players its use requires no modifications
Therefore, In-Memory provides a big leap in performance with

- low risks
- low project-, infrastructure-, maintenance- & development cost
However, In-Memory is no silver bullet

Speed-up varies very much on query complexity
Good design of ETL processes & analyses remains important
Powerful infrastructure is still required
(think about using Oracle Engineered Systems)
2014 sumIT AG
03/2012
32

CON5193 Oracle In-Memory The Game Changer in Data Warehousing and Business Intelligence

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CON5193 Oracle In-Memory The Game Changer in Data Warehousing and Business Intelligence

Uploaded by

Copyright:

Available Formats

Oracle In-Memory - Game Changer in

Data Warehousing and Business Intelligence

Focussed on Oracle technology

Technical Director of sumIT AG

First Oracle ACE for DWH/BI in Switzerland

High compression benefits on

Drawbacks (Up To Now)

Less suited for OLTP

Complex DBMS infrastructure has to

The usual suspects

Microsoft (Columnstore Indexes)

Columnar Stores/DBs - Oracles Flavour

Transparent querying across storage hierarchy

In-Memory Storage Index

SIMD Vector Processing

New optimizer transformation Vector Group By

In-Memory Aggregation - XPLAN

In-Memory on RAC Including Fault Tolerance

The In-Memory-Option can extremely improve query performance

Join / Sort / Group /

Join / Sort / Group /

Unprecedented Performance for

(Tool based) OLAP

Simple Reporting Queries

scanning in columnar store use IMCU storage indexes

SQL*Analytics Reporting Queries

scanning in columnar store use IMCU storage indexes

(Tool Based) OLAP

short scanning time

few simple joins (star shape)

Technology & consequences

Different Reporting Queries

Acceleration Of Reporting Queries

result set row store (SGA) columnar store

14M & 55K

14M & 55K

14M & 1.8K & 72

Demo comparing SGA row based vs. in-memory columnar store

Data Quality & Consistency Assessment

column value checks

often complex conditions

Meta Data Transformation During ETL

gender entries for all rows

- transformation of source dependent (domain) data into DWH

Example Key Lookup Query

1. scan staging table

Chaining Of Bloom Filters

2. create Bloom filters

1. scan staging table

3. apply Bloom filters

Therefore, In-Memory provides a big leap in performance with

However, In-Memory is no silver bullet

You might also like