You are on page 1of 42

Complex SQL Performance

© Copyright IBM Corporation 2013


Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 5.3
Unit objectives
After completing this unit, you should be able to:
• Review Explain reports for costly sort operations
• Describe the differences between Nested Loop, Merge Scan and Hash Joins
• Create indexes required to support efficient Star Schema joins, including the Zigzag
join
• Plan the implementation of Refresh Immediate or Refresh Deferred Materialized
Query Tables to improve query performance
• Utilize the Design Advisor to analyze SQL statements and recommend new MQTs
• Describe the features of range-partitioned tables to support large DB2 tables using
multiple table spaces, including the roll-in and roll-out of data ranges
• Explain the difference between partitioned and non-partitioned indexes for a range-
partitioned table
• Implement partitioned indexes to improve performance when you roll data out or roll
data into a range-partitioned table
• Use the DB2 Explain tools to determine if partition elimination is being used to
improve access performance to large range-partitioned tables

© Copyright IBM Corporation 2013


Sort costs for Small versus Large results
SELECT * FROM ACCT SELECT * FROM ACCT
WHERE ACCT_GRP BETWEEN 100 AND 150 WHERE ACCT_GRP BETWEEN 100 AND 800 order by
order by balance desc balance desc

Access Plan:
Access Plan:
-----------
-----------
Total Cost: 51489.3
Query Degree: 1 Total Cost: 345735
Query Degree: 1
Rows
RETURN Rows
RETURN
( 1)
( 1)
Cost
Cost
I/O
I/O
|
|
51013.8
700982
TBSCAN
( 2) TBSCAN
51489.3 ( 2)
345735
28588
|
No I/O Cost Large I/O Cost 68646
|
51013.8
SORT
To Sort To Sort 700982
( 3) 51K Rows 700K Rows SORT
( 3)
51489.3
309824
28588
| 48617
51013.8 |
TBSCAN 700982
TBSCAN
( 4)
51449.8 Also Monitor: ( 4)
51556.8
28588 Buffer pool temporary data logical reads 28588
| Buffer pool temporary data physical reads |
1e+006
1e+006
TABLE: ADMIN
ACCT TABLE: ADMIN
ACCT

© Copyright IBM Corporation 2013


Sort overflow in Explain details
SELECT * FROM ACCT
WHERE ACCT_GRP BETWEEN 100 AND 800 order by balance desc

3) SORT : (Sort)
Cumulative Total Cost: 309824
Cumulative CPU Cost: 5.46755e+009
Cumulative I/O Cost: 48617
Cumulative Re-Total Cost: 0
Cumulative Re-CPU Cost: 0
Cumulative Re-I/O Cost: 20029
Cumulative First Row Cost: 309824
Estimated Bufferpool Buffers: 48617

Arguments:
---------
DUPLWARN: (Duplicates Warning flag)
FALSE
NUMROWS : (Estimated number of rows)
700982
ROWWIDTH: (Estimated width of rows)
Estimated 108
Sort SORTKEY : (Sort Key column)
1: Q1.BALANCE(D)
Overflow SPILLED : (Pages spilled to bufferpool or disk)
20029
TEMPSIZE: (Temporary Table Page Size)
4096
UNIQUE : (Uniqueness required flag)
FALSE

© Copyright IBM Corporation 2013


Nested Loop Join

DESCRIPTION JOB_ID JOB_ID FIRSTNAME


... 8 Debbie ...
Computer Analyst 1
... 8 Tad ...
Interpreter 2
...
... Civil Engineer 3 etc. 1 David
... 4 Doreen ...
Biotechnician 4
... 2 Donna ...
Medical Technologist 5
... 1 Doug ...
Police 6
... 3 Donald ...
Nurse 7
... ...
CPA 8 5 Debbie
...
6 Dennis
...
. 7 Diane
6
OUTER TABLE 7 INNER TABLE
.
SELECT *
FROM SIBLINGS S, INDEX ON NO INDEX
OCCUPATIONS O SIBLINGS AVAILABLE
WHERE (JOB_ID) OR USED
S.JOB_ID = O.JOB_ID
© Copyright IBM Corporation 2013
Nested Loop Join Access Plan

794.52 SELECT HISTORY.BRANCH_ID,


TBSCAN TELLER.TELLER_NAME,
( 2)
3827.09 HISTORY.ACCTNAME,
317.376 HISTORY.ACCT_ID, HISTORY.BALANCE
| FROM HISTORY AS HISTORY, TELLER AS TELLER
794.52 WHERE HISTORY.TELLER_ID =
SORT TELLER.TELLER_ID AND HISTORY.BRANCH_ID = 25
( 3) ORDER BY HISTORY.BRANCH_ID ASC,
3827.02
317.376
HISTORY.ACCT_ID ASC ;
|
794.52
NLJOIN
( 4)
OUTER Table 3826.48 INNER Table
317.376
/----------+---------\
1000 0.79452 ƒInner Table access cost is
FETCH FETCH based on matching one outer
( 5) ( 7)
159.046 23.0878
table row.
33 1.79452 ƒJoin Cost is adjusted based on
/---+---\ /---+---\ index clustering
1000 1000 0.79452 79452
IXSCAN TABLE: ADMIN IXSCAN TABLE: ADMIN
( 6) TELLER ( 8) HISTORY
52.0664 12.8738
4 1
| |
1000 79452
INDEX: ADMIN INDEX: ADMIN
TELLINDX HISTIX1

© Copyright IBM Corporation 2013


Merge Scan Join

DESCRIPTION JOB_ID JOB_ID FIRSTNAME


1 David ...
... Computer Analyst 1
1 Doug ...
... Intepreter 2
2 Donna ...
... Civil Engineer 3
3 Donald ...
... Biotechnician 4 ...
4 Doreen
... ...
Medical Techno. 5 5 Debbie
... ...
Police 6 6 Dennis
... ...
Nurse 7 7 Diane
... ...
CPA 8 8 Debbie
. . . If Duplicate Join Value 8
...
8 Tad

SELECT *
• JOIN SATISFIED THROUGH
FROM SIBLINGS S,
TABLES ORDERED ON
OCCUPATIONS O
JOIN COLUMN(S), INDEXES
WHERE
COULD REDUCE SORT COSTS.
S.JOB_ID = O.JOB_ID

© Copyright IBM Corporation 2013


Merge Scan Join Access Plan
794.52
MSJOIN SELECT HISTORY.BRANCH_ID,
( 4)
830.571
TELLER.TELLER_NAME,
375 HISTORY.ACCTNAME,
Index Used for /-------+------\ HISTORY.ACCT_ID, HISTORY.BALANCE
1000 0.79452 FROM HISTORY AS HISTORY, TELLER AS TELLER
Ordering FETCH FILTER WHERE HISTORY.TELLER_ID = TELLER.TELLER_ID
By Join Column ( 5) ( 7) AND HISTORY.BRANCH_ID = 25
159.046 671.23 ORDER BY HISTORY.BRANCH_ID ASC,
33 342
/---+---\ | HISTORY.ACCT_ID ASC ;
1000 1000 794.52
IXSCAN TABLE: ADMIN TBSCAN
( 6) TELLER ( 8)
52.0664 671.23
4 342
| | Sort Used for Ordering
1000 794.52
INDEX: ADMIN SORT By Join Column
TELLINDX ( 9)
671.229
342
OUTER Table | INNER Table
794.52
TBSCAN
( 10)
670.873
342
|
79452
TABLE: ADMIN
HISTORY

© Copyright IBM Corporation 2013


Hash Join

DESCRIPTION JOB_ID JOB_ID FIRSTNAME

... Civil Engineer 3 ...


3 Debbie B
... Interpreter 2 u
4 Tad ...
... Navigator 8 c
...
k
... Biotechnician 4 etc. 8 Gerhard e
... Medical Technologist 5 ...
t
8 Doreen 1
... Police 6
1 Donna ...
... Nurse 7 B
... u
... Computer Analyst 1 1 Doug c
... k
2 David
e
... t
SELECT * 5 Debbie
2
...
FROM SIBLINGS S, 6 Dennis B
...
u
7 Diane 3
OCCUPATIONS O
WHERE
S.JOB_ID = O.JOB_ID

© Copyright IBM Corporation 2013


Hash Join Access Plan
SELECT HISTORY.BRANCH_ID,
794.52 TELLER.TELLER_NAME,
HSJOIN
( 4)
HISTORY.ACCTNAME,
HISTORY.ACCT_ID, HISTORY.BALANCE
OUTER Table 722.496
336.17
INNER Table FROM HISTORY AS HISTORY, TELLER AS TELLER
/----------+---------\ WHERE HISTORY.TELLER_ID =
794.52 1000 TELLER.TELLER_ID AND HISTORY.BRANCH_ID = 25
FETCH FETCH ORDER BY HISTORY.BRANCH_ID ASC,
( 5) ( 9) HISTORY.ACCT_ID ASC ;
563.302 159.046
303.17 33
/---+---\ /---+---\
794.52 79452 1000 1000
RIDSCN TABLE: ADMIN IXSCAN TABLE:
ADMIN
( 6) HISTORY ( 10) TELLER
13.6858 52.0664 Temp
1 4
| | overflow
794.52 1000
SORT INDEX: ADMIN
( 7) TELLINDX
13.6853 Sortheap
1
|
794.52
IXSCAN
( 8)
13.3792
1
|
79452
INDEX: ADMIN
HISTIX1

© Copyright IBM Corporation 2013


Star Schema Defined

Fact table
• What is Star Schema?
Date
– Simplest form of a dimensional Id Dimension
model Date table
Day
Month
Quarter Snowflakes
• How the data is organized? Year

– Facts
– Dimensions
Account Sales Store

• A typical star schema based Id


Name
Date_Id Id
Store_Id Store_Numbe
query Company Product_Id
Units_Sold
r
Province
– Joins a subset of the dimensions Country

with the fact tables


– Usually, there are no joins among
the dimensions Product
Id
EAN_Code
Product_Name
Brand
Options: Category

• Cartesian join of dimensions


• Star join and dynamic bitmaps
• Zigzag join

© Copyright IBM Corporation 2013


Cartesian Join of dimensions

• Cartesian-hub join plan computes the Cartesian product of dimensions


• Each row in the Cartesian product is then used to probe
the multicolumn fact table index.

Nested Loop
Join

Cartesian
Join

*
Multi-Column
Fact Table

Index
Product
Cartesian Dimension
Join

Period
Store Dimension
Dimension
* Additional Prerequisite
© Copyright IBM Corporation 2013
Star Join and dynamic bitmaps

• A star join plan pre-filters the fact table by dimensions


to generate semi-joins
• Next, performs Index ANDs Join

using results of the semi-joins 6. Period


Join Dimension
• Next completes the semi-joins
6. Product
Join Dimension
6. Store
FACT 5. RID Fetch Dimension
TABLE
3. (4.)
IXAND

2. 2.
Nested Loop Nested Loop Nested Loop
Join Join Join

1. 1. 1.
Fact Table Product Fact Table Period
Store Fact Table
* store_id * product_id Dimension * period_id
Dimension
Index Dimension Index Index

* Additional Prerequisite
© Copyright IBM Corporation 2013
Characteristics of a Zigzag join
• Joins a fact table and two or more dimension tables in a star schema,
using an index scan of the fact table
• It requires equality predicates between each dimension table and the
fact table.
• The join method calculates the Cartesian product of rows from the
dimension tables without actually materializing the Cartesian product
• Probes the fact table using a multicolumn index, so that the fact table is
filtered along two or more dimension tables simultaneously
• The probe into the fact table finds matching rows
• The zigzag join then returns the next combination of values that is
available from the fact table index
• This next combination of values, known as feedback, is used to skip
over probe values provided by the Cartesian product of dimension
tables that will not find a match in the fact table
• Filtering the fact table on two or more dimension tables simultaneously,
and skipping probes that are known to be unproductive, together makes
the zigzag join an efficient method for querying large fact tables.
© Copyright IBM Corporation 2013
The Zigzag Join Method for Star Schema Based Queries
• How does it work?
– First forms the conceptual Cartesian product of dimensions but avoids
most non-productive probes from the Cartesian product into the fact
table
– Fact table index provides feedback to dimensions
– Zigzags through the dimensions and the fact tables Cartesian product of
dimension keys
Fact table multi
• Pre-requisite: A multi-column index on the d1 d2 column index
fact table on columns that join with
the dimensions 1 1 1
f1 f2
1 3
2 1 1
1 4 2 2
Unproductive 3
probes are 1 5 3 3
skipped` 4 4
Dimension keys 2 1
2 3 5 5
4
d1 d2 6 … …
2 4
1 1 5
Unproductive 2 5
2 3 probes are
skipped 3 1
3 4 probe
3 3
4 5 match
3 4
… …
… … Join: d1=f1 and d2=f2

© Copyright IBM Corporation 2013


Example access plan with Zigzag join
2.6623e+06
ZZJOIN
SELECT income_level_desc, ( 5)
7620.42
sum(quantity_sold) "Quantity" 5.37556
from daily_sales s, +------------------+------------------+
customer c, period p 292.2 40000 0.227781
TBSCAN TBSCAN FETCH
where ( 6) ( 9) ( 13)
calendar_date between 56.2251 7596.78 11.8222
1 2.92 1.22778
'1996-03-01' and '1996-03-31' | | /---+----\
and p.perkey = s.perkey 292.2 40000 0.227781 6.65576e+08
and s.custkey = c.custkey TEMP TEMP IXSCAN TABLE: POPS
( 7) ( 10) ( 14) DAILY_SALES
and age_level = 7 30.4233 4235.52 9.93701 Q3
group by income_level_desc; 1 2.92 1
| | |
292.2 40000 6.65576e+08
IXSCAN FETCH INDEX: POPS
( 8) ( 11) PER_CUST_ST_PROMO
29.9655 4235.07 Q3
1 2.92
| /---+----\
2922 40000 1e+06
INDEX: POPS IXSCAN TABLE: POPS
PERX1 ( 12) CUSTOMER
Q1 2763.52 Q2
1
|
1e+06
INDEX: POPS
CUSTX1
Q2
© Copyright IBM Corporation 2013
Ensuring that queries fit the required criteria for the
zigzag join
• Check tables included in the zigzag join, do they fit the required criteria
– Each dimension tables must have a primary key, a unique constraint,
or a unique index defined on it
• Write a suitable query
– Query must have equality join predicates between each dimension
tables unique index and the fact table columns
• Check for a suitable multicolumn index on the fact table
– The multicolumn index must include columns from the fact table that
have join predicates with dimension table columns
• Use explain tool, db2exfmt to check access plan for zigzag join
• If explain report does not include zigzag join
– Check for special messages in “Extended Diagnostic Information”
section of db2exfmt

EXP0256I Analysis of the query shows that the query might execute faster
if an additional index was created to enable zigzag join.
Schema name: table-schema. Table name: table-name.
Column list: column-list.

© Copyright IBM Corporation 2013


Materialized Query Tables
• Materialized Query Tables are sometimes referred to as materialized views
– MQT are defined using CREATE TABLE …… AS SELECT
• DB2 Optimizer's Query Rewrite will access the MQT instead of base tables if cost
can be reduced, with some restrictions:
– Query Optimization level must be 2, 5, 7 or 9
– Isolation level of CREATE ... AS must be greater or equal to runtime isolation level
• REFRESH IMMEDIATE:
– REFRESH TABLE or SET INTEGRITY can be used to initially populate the MQT.
– Changes made to underlying tables are cascaded immediately to MQT
– Consider maintenance cost of changes to underlying tables
– Might be considered for dynamic and static SQL
• REFRESH DEFERRED:
– MQT can only be considered for dynamic SQL
– Data in MQT can be refreshed at any time with the REFRESH TABLE statement or by
using SET INTEGRITY
– CURRENT REFRESH AGE special register can be set to ANY for allow DB2 optimizer
to consider using a deferred refresh MQT to replace access to parent tables
– Database configuration option DFT_REFRESH_AGE can be set to ANY to avoid
requirement for each application to set refresh age

© Copyright IBM Corporation 2013


MQTs: Example
CREATE TABLE TP1SUM AS
(SELECT ACCT_GRP, COUNT(*) AS COUNTS, Access Plan:
-----------
SUM(BALANCE) AS GROUP_BALANCE
FROM ACCT Rows
GROUP BY ACCT_GRP) RETURN
( 1)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
Cost
I/O
REFRESH TABLE TP1SUM; |
RUNSTATS ON TABLE ADMIN.TP1SUM; 32.2464
TBSCAN
( 2)
50.3867
SELECT ACCT_GRP, SUM(BALANCE) AS SUM_BALANCE 2
FROM ACCT |
WHERE ACCT_GRP BETWEEN 20 AND 80 100
GROUP BY ACCT_GRP TABLE: ADMIN
TP1SUM
HAVING SUM(BALANCE) > 80000;

© Copyright IBM Corporation 2013


Performance issues with MQT Refresh options
• Refresh Immediate MQT:
– All changes to the base tables immediately update the dependent MQT which
adds processing to those SQL statements and increases cost and complexity
– Could create lock contention between transactions and reporting applications
– Cost impact is minimal if base tables are read only

• Refresh Deferred MQT:


– Changes to base tables are not propagated to MQT so the MQT needs to be
manually refreshed to reflect current data.
– Performance of SQL statements that update the base tables is not impacted by
creating MQTs
– Cost to refresh the MQT might be significant if there are large base tables

• Refresh Deferred MQT with staging table defined:


– Changes to base tables are immediately propagated to a staging table which adds
some processing to those SQL statements and increases cost and complexity
– Less likely to have lock contention, since the staging table would not be accessed
by reporting applications
– Allows for an incremental refresh of the MQT using the data in the staging table
which could be much more efficient than using large base tables directly.
© Copyright IBM Corporation 2013
Example of defining a
deferred refresh MQT with a staging table
1. Define the MQT with deferred refresh. The MQT is based on summary
information from two tables.
Create table mqt2 as (
SELECT Teller.TELLER_ID, sum(HISTORY.BALANCE) as total_balance,
TELLER.TELLER_NAME , count(*) as transactions
FROM INST411.HISTORY AS HISTORY, INST411.TELLER AS TELLER
WHERE HISTORY.TELLER_ID = TELLER.TELLER_ID
GROUP BY TELLER.TELLER_ID, TELLER.TELLER_NAME )
data initially deferred refresh deferred in tp1sms ;

2. Create a staging table for the MQT.


CREATE TABLE stage2 for mqt2 propagate immediate in tp1sms ;

3. Use SET INTEGRITY to refresh the MQT and resolve Set Integrity Pending
for the staging table.
SET INTEGRITY FOR mqt2,stage2 IMMEDIATE CHECKED

4. Use RUNSTATS on the MQT to provide the optimizer current statistics


RUNSTATS on table inst411.mqt2

5. Use REFRESH TABLE to incrementally update the MQT using the staging
table. Use runstats to get current statistics for staging table first.
RUNSTATS on table inst411.stage2
REFRESH TABLE inst411.mqt2
© Copyright IBM Corporation 2013
Example of the access plan for an INSERT into a
parent table of a MQT with a staging table defined

0.333333
Original Statement: TBSCAN
( 2)
------------------ 30.2853
INSERT INTO HISTORY(ACCT_ID, 4
TELLER_ID +-----------------------+--------+-------------+
BRANCH_ID, BALANCE, DELTA, PID, 1 1 1
TBSCAN INSERT TABFNC: SYSIBM
ACCTNAME, TEMP) ( 3) ( 7) GENROW
VALUES(:H00001 7.57432 22.7103
, :H00004 , :H00003 , :H00005 , 1 3
:H00006, :H00008 , :H00007 , 'TP1ST ') | /---+---\
1 1 38
TEMP GRPBY TABLE: INST411
( 4) ( 8) STAGE2
7.5663 15.1473
1 2
| |
1 1
Insert into HISTORY INSERT NLJOIN
( 5) ( 9)
7.56331 15.1472
Also Requires: 1
/----+----\
2
/---+--\
1 513576 1 1
TBSCAN TABLE: INST411 TBSCAN IXSCAN
• Insert into Staging table ( 6) HISTORY ( 10) ( 11)
• Access to Teller Index 2.83407e-05 0
7.57432
1
7.57285
1
| | |
1 1 1000
TABFNC: SYSIBM TEMP INDEX: INST411
GENROW ( 4) TELLINDX
7.5663
1
Complete access plan in notes
© Copyright IBM Corporation 2013
Sample Explain report for a
REFRESH TABLE statement using a staging table
DELETE
( 8)
28510.9
3783
/---+--\
848 1000
UPDATE TABLE: INST411
REFRESH TABLE and SET INTEGRITY ( 9)
22098
MQT2
2935
Statements can be explained to better 848
/---+--\
1000
understand the processing required. NLJOIN TABLE: INST411
( 10) MQT2
15685
2087
/---+---\
848 1.01626
GRPBY TBSCAN
( 11) ( 16)
15431.4 106.582 Original Statement:
• Staging table is accessed to 2073
|
14
| ------------------
848 1000
update the MQT TBSCAN TABLE: INST411
refresh table mqt2
( 12) MQT2
15431.3
2073
• Data in the Staging table |
848
SORT
is deleted automatically ( 13)
15431.3
2073
|
2035
DELETE
( 14)
15430.1
2073
/---+---\
2035 2035
TBSCAN TABLE: INST411
( 15) STAGE2
40.6609
38
|
2035
TABLE: INST411
STAGE2 Complete access plan in notes

© Copyright IBM Corporation 2013


MQT evaluation and usage diagnostics
Extended Diagnostic Information:
--------------------------------
Diagnostic Identifier: 2
Diagnostic Details: EXP0022W Index has no statistics. The
index "CMCCAIN "."MQTIX1" has not had runstats run on it.
This can lead to poor cardinality and predicate filtering
estimates.

Diagnostic Identifier: 1
Diagnostic Details: EXP0020W Table has no statistics. The
table "CMCCAIN "."MQT1" has not had runstats run on it. This
may result in a sub-optimal access plan and poor performance.

Diagnostic Identifier: 3
Diagnostic Details: EXP0148W The following MQT or
statistical view wasconsidered in query matching: "CMCCAIN
"."MQT1".

Diagnostic Identifier: 4
Diagnostic Details: EXP0149W The following MQT was used
(from those considered) in query matching: "CMCCAIN "."MQT1".

© Copyright IBM Corporation 2013


Using db2advis for MQT recommendations (1 of 2)

db2advis -d tp1 -file lab6sum.sql -type IM > lab6advise1.txt

execution started at timestamp 2009-09-25-14.11.30.440109


Using the default table space name USERSPACE1
found [1] SQL statements from the input file
Recommending indexes...
Recommending MQTs...
Found 1 user defined views in the catalog table
Found [2] candidate MQTs
Getting cost of workload with MQTs
total disk space needed for initial set [ 5.882] MB
total disk space constrained to [ 32.718] MB
Trying variations of the solution set.
Optimization finished.
1 indexes in current solution
1 MQTs in current solution
[8850.0000] timerons (without recommendations)
[ 8.0000] timerons (with current solution)
[99.91%] improvement--

© Copyright IBM Corporation 2013


Using db2advis for MQT recommendations (2 of 2)
-- LIST OF RECOMMENDED MQTs
-- ===========================
-- MQT MQT909251812330000 can be created as a refresh immediate MQT
-- mqt[1], 0.032MB
CREATE SUMMARY TABLE "INST411 "."MQT909251812330000"
AS (SELECT Q3.C0 AS "C0", Q3.C1 AS "C1", Q3.C2 AS
"C2" FROM TABLE(SELECT Q2.C0 AS "C0", SUM(Q2.C1) AS
"C1", COUNT(* ) AS "C2" FROM TABLE(SELECT Q1.ACCT_GRP
AS "C0", Q1.BALANCE AS "C1" FROM INST411.ACCT AS Q1)
AS Q2 GROUP BY Q2.C0) AS Q3) DATA INITIALLY DEFERRED
REFRESH IMMEDIATE IN TP1DMSAD ;
COMMIT WORK ;
REFRESH TABLE "INST411 "."MQT909251812330000" ;
COMMIT WORK ;
RUNSTATS ON TABLE "INST411 "."MQT909251812330000" WITH DISTRIBUTION;
COMMIT WORK ;

--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- index[1], 0.036MB
CREATE INDEX "INST411 "."IDX909251812470000" ON "INST411
"."MQT909251812330000"
("C0" ASC, "C1" DESC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED
STATISTICS;
COMMIT WORK ;
© Copyright IBM Corporation 2013
Table partitioning: What is it and why use it?
• Allows a single logical table to be
broken up into multiple separate
Without Partitioning
physical storage objects:
– Each corresponds to a partition of the
table
SALESDATA
– Partition boundaries correspond to
specified value ranges in a specified
partition key
• Main Benefits:
– Allows for partition elimination during With Partitioning
SQL processing
Applications
– Allows for optimized roll-in / roll-out see single table
processing (for example, minimized
logging)
– Allows for divide and conquer
management of huge tables SALESDATA SALESDATA SALESDATA
JanPart FebPart MarPart
– Allows for improved HSM integration

© Copyright IBM Corporation 2013


Considerations for creating a partitioned table
• What tables benefit from being partitioned:
– Large tables
– Roll-in/Roll-out
– Business intelligence style queries

• Which columns to partition on:


– Dates (roll-in)
– Partition elimination

• Granularity of ranges should match roll-in/roll-out


• Consider placing different ranges in different table spaces
• Table space assignment for indexes and large objects and
XML data

© Copyright IBM Corporation 2013


Storage mapping:
Mapping ranges to table spaces (1 of 2)
• Short syntax:
– The IN clause on CREATE
TABLE now accepts a list tbsp1 tbsp2 tbsp3

– In this example, the ranges sales.1Q/04 sales.2Q/04 sales.3Q/04

will cycle through the provided sales.4Q/04 sales.1Q/05 sales.2Q/05


table spaces in round-robin
fashion
– Data in the 1Q/2004 will be
Table spaces must have the same Page size
placed in tbsp1, 2Q/2004 in and Extent size
tbsp2, 3Q/2004 in tbsp3,
4Q/2004 in tbsp1, etc.

CREATE TABLE sales(sale_date DATE, customer INT, …)


IN TBSP1, TBSP2, TBSP3
PARTITION BY RANGE(sale_date)
(
STARTING '1/1/2004' ENDING '12/31/2008'
EVERY 3 MONTHS
);

© Copyright IBM Corporation 2013


Storage mapping:
Mapping ranges to table spaces (2 of 2)

• Long syntax: TBSPD1 TBSPD2 TBSPD3


– You can explicitly specify a table
space for each data partition sales.rest sales.q1 sales.q2

– In this example:
• Sales data with dates prior to 1/1/2007 will be
placed in tbspd1
TBSPD4 TBSPD5
• 1st quarter sales in tbspd2
• 2nd quarter sales in tbspd3 sales.q3 sales.q4
• 3rd quarter sales in tbspd4
• 4th quarter 2007 sales will be in tbspd5

CREATE TABLE sales(sale_date DATE, customer INT, …)


PARTITION BY RANGE(sale_date)
(
PART rest STARTING MINVALUE IN TBSPD1,
PARTITION q1 STARTING '1/1/2007‘ IN TBSPD2,
PARTITION q2 STARTING '4/1/2007‘ IN TBSPD3,
PARTITION q3 STARTING '7/1/2007‘ IN TBSPD4,
PARTITION q4 STARTING '10/1/2007' ENDING '12/31/2007‘ IN TBSPD5
) INDEX IN TBSPI1 ;

© Copyright IBM Corporation 2013


Global (Non-partitioned) indexes
• Non-partitioned indexes:
– Each index contains entries of every row in the range-partitioned
table
– MDC Block indexes will be non-partitioned
– Each index is managed as a separate storage object
– CREATE INDEX IN tsname can be used to override location for new
indexes

Index 1 Index 2
Sales Date Product ID

IX Table space 1 IX Table space 2

Data Data Data Data


Range 1 Range 2 Range 3 Range 4
2009 Q1 2009 Q2 2009 Q3 2009 Q4

Table space 1 Table space 2 Table space 3 Table space 4

© Copyright IBM Corporation 2013


Partitioned indexes
• Partitioned indexes:
– A unique partitioned index must contain the columns used in the PARTITION BY
RANGE clause
– Each index contains one index partition for each data range in the
range-partitioned table
– All partitioned indexes are managed as a single storage object per data range
– CREATE INDEX IN tsname can NOT be used to override the index location The
INDEX IN clause of CREATE TABLE can be specified for a single data range
– INDEX IN clause can be specified for ALTER TABLE ADD PARTITION

One Index 1 Index 1 Index 1 Index 1


Sales Date Sales Date Sales Date Sales Date
Storage
Object Index 2 Index 2 Index 2 Index 2
Product ID Product ID Product ID Product ID

Data Data Data Data


Range 1 Range 2 Range 3 Range 4
2009 Q1 2009 Q2 2009 Q3 2009 Q4

Table space 1 Table space 2 Table space 3 Table space 4


© Copyright IBM Corporation 2013
Example of creating partitioned indexes
• Placement for partitioned indexes can be specified using the long form
when the table is created
CREATE TABLE PARTTAB.HISTORYPART ( ACCT_ID INTEGER NOT NULL ,
TELLER_ID SMALLINT NOT NULL ,
BRANCH_ID SMALLINT NOT NULL ,
BALANCE DECIMAL(15,2) NOT NULL ,
……….
TEMP CHAR(6) NOT NULL )
PARTITION BY RANGE (BRANCH_ID)
(STARTING FROM (1) ENDING (20) IN TSHISTP1 INDEX IN TSHISTI1 ,
STARTING FROM (21) ENDING (40) IN TSHISTP2 INDEX IN TSHISTI2 ,
STARTING FROM (41) ENDING (60) IN TSHISTP3 INDEX IN TSHISTI3 ,
STARTING FROM (61) ENDING (80) IN TSHISTP4 INDEX IN TSHISTI4 ) ;

CREATE INDEX PARTTAB.HISTPIX1 ON PARTTAB.HISTORYPART (TELLER_ID)


PARTITIONED ;

CREATE INDEX PARTTAB.HISTPIX2 ON PARTTAB.HISTORYPART (BRANCH_ID)


PARTITIONED ;

• The IN tsname clause of the CREATE INDEX statement can not be


used to specify the location for partitioned indexes

© Copyright IBM Corporation 2013


Partition elimination shown in DB2 Explain report
SQL Statement:

select * from historypart


where branch_id between 11 and 60
and teller_id between 800 and 810 Table ranges defined on
Branch_id column
Section Code Page = 850
Predicate included:
( 5) Access Table Name = INST411.HISTORYPART ID = -6,-32768 branch_id between 11 and 60
| Index Scan: Name = INST411.HISTPIX1 ID = 1
| | Regular Index (Not Clustered)
| | Index Columns:
| | | 1: TELLER_ID (Ascending)
| #Columns = 0
| Data-Partitioned Table
| Skip Inserted Rows
| Avoid Locking Committed Data
| Currently Committed for Cursor Stability
| Data Partition Elimination Info:
| | Range 1:
| | | #Key Columns = 1
| | | | Start Key: Inclusive Value
| | | | | 1: 11
| | | | Stop Key: Inclusive Value
| | | | | 1: 60
| Active Data Partitions: 0-2
| #Key Columns = 1 Active Data Partitions: 0-2
| | Start Key: Inclusive Value
| | | | 1: 800
| | Stop Key: Inclusive Value
| | | | 1: 810
| Index-Only Access
| Index Prefetch: None

© Copyright IBM Corporation 2013


Access Plan example using
Partitioned and Non-partitioned indexes
| 4845.46
4750 FETCH
FETCH ( 4)
( 4) 2525.48
283.611 2365.51
143.98 /----+-----\
/----+-----\ 4845.46 472956
21410.3 472956 RIDSCN DP-TABLE: PARTTAB
RIDSCN DP-TABLE: PARTTAB ( 5) HISTORYPART2
( 5) HISTORYPART 445.129 Q1
143.274 Q1 290.269
16.8401 |
| 4845.46
21410.3 SORT
SORT With ( 6) With
( 6)
Partitioned
445.129 Non-Partitioned
143.274 290.269
16.8401
Indexes | Indexes
| 4845.46
21410.3 Lower I/O Cost IXAND Higher I/O Cost
IXSCAN
Using 1 Index
( 7) Uses 2 indexes
( 7) 443.261
135.064 290.269
16.8401 /-----+------\
| 22976.8 49869.7
472956 IXSCAN IXSCAN
INDEX: PARTTAB ( 8) ( 9)
HISTPIX1 210.907 229.713
Q1 139.29 150.979
| |
472956 472956
INDEX: PARTTAB INDEX: PARTTAB
HISTP2IX1 HISTP2IX2
Q1 Q1
© Copyright IBM Corporation 2013
Parallel Scans on Partitioned Indexes

• Parallel scans provide an even distribution of work among subagents


– Balanced workload among subagents Æ efficient use of CPU resources
• Parallel scans can now be run against partitioned indexes
– Partitioned indexes are divided into ranges of records
– Subagents are assigned a range of records, once it completes a range, it is
assigned a new one
– The index partitions are scanned sequentially

Range Partitions

© Copyright IBM Corporation 2013


Operations for Roll-out and Roll-in
• ALTER TABLE … DETACH:
– An existing range is split off as a stand alone table
– Data instantly becomes invisible
– Minimal interruption to other queries accessing table
• ALTER TABLE … ATTACH:
– Incorporates an existing table as a new range
– Follow with SET INTEGRITY to validate data and maintain any non-
partitioned indexes
• May utilize IMMEDIATE UNCHECKED to reduce delay
– Data becomes visible all at once after COMMIT for SET INTEGRITY
– Minimal interruption to other queries accessing table
• Key points:
– No data movement
– Nearly instantaneous
– SET INTEGRITY is now online

© Copyright IBM Corporation 2013


Roll-in summary
• LOAD / Insert into NewMonthSales
• (Perform ETL on NewMonthSales)
• ALTER TABLE Big_Table …
ATTACH PARTITION …
LOAD
STARTING '03/01/2008'
Tablespace A Tablespace B Tablespace C
ENDING '03/31/2008'
Big_Table.p1 Big_Table.p2 NewMonthSales
FROM TABLE NewMonthSales
– Very fast operation
– No data movement required ATTACH
– Index maintenance deferred
Tablespace A Tablespace B Tablespace C
• COMMIT
Big_Table.p1 Big_Table.p2 Big_Table.p3
– New data still not visible
• SET INTEGRITY FOR Big_Table
……
– Potentially long running operation:
• Validates data
• Maintains Non-partitioned indexes,
MQTs
– Existing data available while it runs
• COMMIT
– New data visible
© Copyright IBM Corporation 2013
Roll-out summary
• ALTER TABLE Big_Table
DETACH PARTITION p3
INTO TABLE OldMonthSales
– Very fast operation
– No data movement required
– Index maintenance for non-partitioned
indexes performed asynchronously in
background
• DETACH is not allowed on a table that is Table space A Table space B Table space C

the parent of an enforced referential Big_Table.p1 Big_Table.p2 Big_Table.p3


integrity (RI) relationship.
• COMMIT:
– Detached data now invisible DETACH
– Detached partition ignored in index
scans Table space A Table space B Table space C
– Rest of Big_Table available
Big_Table.p1 Big_Table.p2 OldMonthSales
• SET INTEGRITY FOR Mqt1, Mqt2
– (Optional) maintains MQTs on Big_Table
• EXPORT OldMonthSales; DROP
OldMonthSales
– (Optional) this becomes a standalone
table that you can do whatever you want
with
© Copyright IBM Corporation 2013
Attach or Detach using non-partitioned indexes
• When a new data range is attached
– Index entries for new rows are added during SET INTEGRITY processing

• When a data range is detached:


– Index entries for the detached range must be removed by ASYNC Index Cleanup
– The Detached table does not have any indexes (except MDC block indexes)

Index 1 Index 2
Sales Date Product ID

IX Table space 1 IX Table space 2

Data Data Data Data


detach Range 1 Range 2 Range 3 Range 4
attach
2009 Q1 2009 Q2 2009 Q3 2009 Q4

Table space 1 Table space 2 Table space 3 Table space 4


© Copyright IBM Corporation 2013
Attach or Detach using partitioned indexes
• When a new data range is attached:
– Reduced SET INTEGRITY processing, if matching Indexes exist on attached table, no
indexes built for new range during SET INTEGRITY processing.
– ERROR ON MISSING INDEXES option causes ATTACH to fail if source table does
not have matching indexes. By default, any missing indexes will be created.

• When a data range is detached:


– Index entries for the detached range are assigned to the detached table, ASYNC index
processing is not needed.
– Partition indexes are retained and assigned default names during detach.

Index 1 Index 1 Index 1 Index 1


Sales Date Sales Date Sales Date Sales Date

Index 2 Index 2 Index 2 Index 2


detach Product ID Product ID Product ID Product ID attach

Data Data Data Data


Range 1 Range 2 Range 3 Range 4
2009 Q1 2009 Q2 2009 Q3 2009 Q4

Table space 1 Table space 2 Table space 3 Table space 4


© Copyright IBM Corporation 2013
Unit summary
Having completed this unit, you should be able to:
• Review Explain reports for costly sort operations
• Describe the differences between Nested Loop, Merge Scan and Hash Joins
• Create indexes required to support efficient Star Schema joins, including the Zigzag
join
• Plan the implementation of Refresh Immediate or Refresh Deferred Materialized
Query Tables to improve query performance
• Utilize the Design Advisor to analyze SQL statements and recommend new MQTs
• Describe the features of range-partitioned tables to support large DB2 tables using
multiple table spaces, including the roll-in and roll-out of data ranges
• Explain the difference between partitioned and non-partitioned indexes for a range-
partitioned table
• Implement partitioned indexes to improve performance when you roll data out or roll
data into a range-partitioned table
• Use the DB2 Explain tools to determine if partition elimination is being used to
improve access performance to large range-partitioned tables

© Copyright IBM Corporation 2013

You might also like