You are on page 1of 11

DB2 ADAPTIVE COMPRESSION - Estimation,

Implementation and Performance Improvement


Somraj Chakrabarty (somrajob@gmail.com) 13 March 2017
DBA Manager
Capgemini Technology Services India Limited

DB2 Adaptive compression is a technique to compress database not only in row level but up to
page level, which made it unique so as to compress to its best possible. This article describes
the compression estimation to decide go or no go for a particular database table to compress,
Implementation of the adaptive compression and finally the performance improvement
achieved post implementation. My Paper consists of detailed steps of compression estimation,
implementation of adaptive compression and performance review of selective queries pre and
post compression activity.

Introduction to Adaptive Compression


Database compression techniques mainly evolved for saving of huge storage costs by means
of compress database row with the help of database dictionary and Performance improvement
with reducing I/O and memory resources especially for bigger table data access. The term
Adaptive compression comes with the idea of Adaptation which incorporates the classic row
level compression along with the new concept of Page level compression. This technique further
compress database pages on the top of compressed rows resulting compression rate which is
much more than classic row compression. Adaptive compression also shrinks table REORG
for static dictionary rebuilding for new data form inserted into rows. Classic Row compression
provides compression rate around 45% to 75% whereas around 70% to 85% can be achieved with
adaptive compression. Thus it is providing around 30% better compression rate. Few specific data
types as long, LOB, index, or XML objects can't be compressed.

Adaptive compression working procedure


Adaptive compression can be enabled using CREATE/ALTER TABLE COMPRESS option. It
actually compress the rows first using table level static compression dictionary. As soon as a page
(Basic building blocks of database) is filled, data within the page is also compressed based upon
page level dictionary concept. Due to database insert, update and delete operation, if page data
changes, page level dictionary automatically get rebuild (Without any REORG operation) with
ever-changing data pattern.

© Copyright IBM Corporation 2017 Trademarks


DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 1 of 11
and Performance Improvement
developerWorks® ibm.com/developerWorks/

Step 1: Table Level dictionary

Adaptive compression obeys classic compression rule at beginning. First step of which is classic
dictionary creation. The STATIC dictionary creation actually searches for frequent data across the
entire table and replaces those with a specific symbol, thus saves storage to a good extent.

Static dictionary creation can be done with Classic or Offline REORG. An alternative way called
Automatic compression can also be activated which creates dictionary automatically when table
data size grows to a certain threshold level of 1 to 2 MB. When Adaptive compression is activated
for a table with pre-existing data, the data won't be compressed in Table level automatically, but
only to page level. To compress the table level data, a full offline REORG is needed.

The below figure (Figure 1) represent a simple table with 6 rows and repetitive data pattern and
table level compression. This figure also indicates how table level compression dictionary looks
like to its simplest form.

Figure 1. Table Level Compression

Static dictionary building can be done with

• Offline Table REORG


• Automatic Dictionary Creation

OFFLINE TABLE REORG


Compression dictionary can be created with Offline Table REORG which therefore compresses
all existing table data per compression dictionary. COMPRESS YES table attribute should also
be turned on for table either during creation or before compression with altering the table ddl.
This way all the table rows take part in populating compression dictionary.Offline table REORG
with RESETDICTIONARY or KEEPDICTIONARY will rebuild the table from Uncompressed to

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 2 of 11


and Performance Improvement
ibm.com/developerWorks/ developerWorks®

compressed data and vice versa. If compress option is YES for an index, Index pages will also be
compressed. The additional processing by the REORG process i.e. creating the new compression
dictionary and compress all the existing table data requires additional resources as extra CPU
cycle and I/O.

• RESETDICTIONARY
When Offline REORG will run with RESETDICTIONARY clause for a table with COMPRESS
YES attribute, a new dictionary is created if the same is not existed or replaced by new dictionary
if it is already available. Data within table gets compressed according to the new compression
dictionary. Due to some specific requirements if table attribute is changed to COMPRESS NO,
then subsequent Offline REORG with RESETDICTIONARY will remove the existing compression
dictionary and all the table data become uncompressed. For a table where data is not already
compressed, REORG BUILD phase compress the data. For a table where data is already
compressed, then the data will be uncompressed, then check the new dictionary format and
compressed again according to the new dictionary during REORG BUILD phase.

• KEEPDICTIONARY
When Offline REORG will run with KEEPDICTIONARY clause for a table with COMPRESS
YES attribute, a new dictionary is created if the same is not existed but no new dictionary is
created if it is already available, rather the old dictionary will remain available. Data within table
gets compressed according to the same old compression dictionary. Due to some specific
requirements if table attribute is changed to COMPRESS NO, then subsequent Offline REORG
with KEEPDICTIONARY will remove the existing compression dictionary and all the table data
become uncompressed. Table data will be compressed during REORG BUILD phase.

AUTOMATIC DICTIONARY CREATION (ADC)


Automatic dictionary creation (ADC) feature creates the compression dictionary automatically
without running REORG by building dictionary and compressing data according to it. During
data population in a table (with INSERT,LOAD,IMPORT and REDISTRIBUTE operation) with
compression attribute YES, compression dictionary will be created automatically (if the same not
already exists) inserted into the table. The data which is inserted into the table after compression
dictionary is created become compressed according to the dictionary. ADC initiation depends upon
amount of table data and its size. Dictionary formation needs to be triggered with optimal amount
of data threshold. Too lower threshold will result the dictionary to become less effective with
respect to the dictionary formed with huge amount of data with varieties of data frequencies. Too
higher threshold will results huge amount of data become uncompressed. It also affects the system
performance during dictionary creation e.g. an insert operation performance can be affected if data
threshold exceeds during the insert. Data population into dictionary and insertion in table become
in sync resulting performance impact of the insert operation.

Step 2: Page Level dictionary


As soon as table level static dictionary is created, Adaptive compression initiates page level
dictionary creation. Repetitive data pattern can exists within a single page and the same may

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 3 of 11


and Performance Improvement
developerWorks® ibm.com/developerWorks/

not be taken into account during static dictionary creation. Page data can also be changed just
after static dictionary creation due to insert/update operation and static dictionary missed the data
pattern. Page level dictionary contains only repetitive pattern within the same page. As in Adaptive
compression, page level compression happens over and above already compressed table level
compressed data so Adaptive compression algorithm checks whether compression dictionary
creation for each page will provide more compression or not. Accordingly page level compression
dictionary is created and inserted into the specific page and remain there as special record.

Day to day INSERT, UPDATE, DELETE operation may introduce or diminish specific repetitive
data and thus that data pattern, but page level dictionary is updated if and only if there is
clear possibility of high compression can be achieved and there won't be any effect in system
performance due to dictionary update every now and then. If there is very rare repetitive data
within a page such that keeping dictionary and data pattern will actually consume more storage
than uncompressed page, then neither compression dictionary is created nor the page gets
compressed. This is similar as of table level compression where a row may or may not be
compressed depending upon the data pattern.

The below figure (Figure 2) represent a simple table with 6 rows and repetitive data pattern with
page level compression of row level compressed table. This figure also indicates how page level
compression dictionary looks like to its simplest form.

Figure 2. Page Level Compression

Adaptive Compression in COLUMN Organized Table:


In Column organized table, the column dictionaries are fully dependent on table data. There are 3
options for loading a column organized table as below

WITH LOAD UTILITY:

• RESETDICTIONARY:

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 4 of 11


and Performance Improvement
ibm.com/developerWorks/ developerWorks®

This is the default option for loading data in column organized table with LOAD REPLACE. This
ensures building of new column dictionaries.

• KEEPDICTIONARY:
This option needs to be used for loading a table (contains data already) with LOAD INSERT
option. This can also be used for LOAD REPLACE option if previously created column dictionaries
need to be kept intact. The newly inserted data will be compressed according to available
dictionary and ANALYZE phase won't be processed.

• RESETDICTIONARYONLY:
This option is unique and can be used for column organized tables only. In this option column
dictionaries will be created according to the input data but without loading any table row. So
Column dictionaries will be built before any insertion.

AUTOMATIC DICTIONARY CREATION (ADC):

Starting from DB2 V10.5 Fixpack 1 ADC is achievable. ADC uses a small sample of data to create
column dictionaries which actually deteriorated over time in terms of effectiveness. Page level
ADC enabled since DB2 V10.5 Fixpack 4. Efficient compression results for column organized
tables can be checked with SYSCAT.TABLES system catalog view. PCTPAGESSAVED column
of SYSCAT.TABLES provide estimation of database storage saving w.r.t uncompressed data in
column organized table. RUNSTATS utility needs to be used to collect the information.

Adaptive Compression Estimation:


Database compression introduced in DB2 V9 and compression estimation can be calculated with
INSPECT command with ROWCOMPESTIMATE option, where data in each row of the table is
reviewed by this INSPECT command, a compression dictionary is built with the available data
pattern and according to the dictionary, estimation is done to check the storage saving if the
table is compressed. Here is an example of the command which can be used for compression
estimation
db2 INSPECT ROWCOMPESTIMATE TABLE NAME SC1.T1 RESULTS T1.COMPRESS.out

The output file T1.COMPRESS.out will be generated in db2 diagnostics log path. However the
output file needs to be formatted to have human readable view with DB2INSPF command as
below
DB2INSPF T1.COMPRESS.out T1.COMPRESS.txt

In DB2 10.1 and later versions, compression estimation can be done with
ADMIN_GET_TAB_COMPRESS_INFO() administrative function. This function can be used to estimate all
the tables within a database and/or schema and for a single table also.

This function can be leveraged as below for compression estimation for both classic row
compression and Adaptive row compression

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 5 of 11


and Performance Improvement
developerWorks® ibm.com/developerWorks/

db2 "SELECT SUBSTR(TABSCHEMA, 1, 10) AS TABSCHEMA,


SUBSTR(TABNAME, 1, 10) AS TABNAME, ROWCOMPMODE,
PCTPAGESSAVED_CURRENT, AVGROWSIZE_CURRENT,
PCTPAGESSAVED_STATIC, AVGROWSIZE_STATIC,
PCTPAGESSAVED_ADAPTIVE, AVGROWSIZE_ADAPTIVE FROM
TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('SC1','T1'))"

TABSCHEMA TABNAME ROWCOMPMODE PCTPAGESSAVED_CURRENT


AVGROWSIZE_CURRENT PCTPAGESSAVED_STATIC
AVGROWSIZE_STATIC PCTPAGESSAVED_ADAPTIVE
AVGROWSIZE_ADAPTIVE

---------- ---------- ----------- ---------------------


------------------ -------------------- -----------------
---------------------- -------------------

SC1 T1 0
25 33 16
42 14

1 record(s) selected.

Let's analyze the above output

Comparison estimation had been calculated for SC1.T1 table where

• ROWCOMPMODE is Blank which means there is no kind of compression activated for this
table.
• PCTPAGESSAVED_CURRENT is 0 means Current percentage of pages saved from row
compression is 0 as there is no compression activated.
• AVGROWSIZE_CURRENT is 25 means Current average record length is 25.
• PCTPAGESSAVED_STATIC is 33 means Estimated percentage of pages saved from Classic
Row Compression is 33%.
• AVGROWSIZE_STATIC is 16 means Estimated average record length from Classic Row
Compression is 16.
• PCTPAGESSAVED_ADAPTIVE is 42 means Estimated percentage of pages saved from
Adaptive Row Compression is 42%.
• AVGROWSIZE_ADAPTIVE is 14 means Estimated average record length from Adaptive Row
Compression is 14.
This result clearly show the advantage of compression over uncompressed table along with
the fact that implementation of adaptive compression would result better than the classic row
compression. More is the percentage, more will be the compression ratio achieved.

Adaptive Compression Implementation:


Starting from DB2 V10.1 the ADAPTIVE COMPRESSION is the default row compression
technique.

In DB2 Version 10.1, a table, enabled with adaptive compression can be created with any of the
below queries
CREATE TABLE <Table Name> COMPRESS YES ADAPTIVE
CREATE TABLE <Table Name> COMPRESS YES

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 6 of 11


and Performance Improvement
ibm.com/developerWorks/ developerWorks®

An existing table can also be altered to be enabled for adaptive compression with any of the below
queries
ALTER TABLE <Table Name> COMPRESS YES ADAPTIVE
ALTER TABLE <Table Name> COMPRESS YES

ADAPTIVE option is the default value for the COMPRESS YES clause because of its default
behavior and the create or alter command holds well.

A database upgraded to V10.1 or higher version still holds table compression setting if the tables
already compressed with classic row compression. Those tables need to be altered to activate
adaptive compression. However existing table rows will not be immediately compressed. Offline
table reorganization is needed to compress all existing rows in that table. The Offline table
reorganization with the RESETDICTIONARY parameter needs to be run to achieve the maximum
possible compression. ADMIN_MOVE_TABLE procedure can also be performed rather than
Offline table reorganization to avoid downtime.

ADMIN_MOVE_TABLE() procedure can be used to move data from a table to another residing in
different storage area while data is accessible. The ADMIN_MOVE_TABLE() builds compression
dictionary (dictionary will be rebuilt if already exists) and insert the dictionary to target table. After
inserting the dictionary, copy phase started from source to target table. As a consequence all the
inserted data compressed according to the compression dictionary in Target table.

For the table SC1.T1, the procedure can be used as below for data movement
CALL SYSPROC.ADMIN_MOVE_TABLE('SC1','T1', '','','','','','','','','MOVE')

Storage Recovery post Adaptive Compression:


Adaptive compression resulting into storage page savings and these free pages actually decrease
the storage usage percentage of the tablespace holding the compressed table. Number of empty
pages can be counted with "Free Pages" value for tablespace details and total free storage can be
calculated by multiplying the "Free pages" value with "Page size (bytes)" value which provide the
result in bytes and can be converted to megabyte or Gigabyte and so on.

Keeping these free pages under the same tablespace possession would result into inefficient
usage of storage because tables residing in other tablespaces and/or other databases can't use
the storage.

To overcome this situation storage recovery should be performed to move the free pages (extents)
from tablespace to filesystem level.

DBA can perform this activity with lowering high-water mark and altering the tablespace with
different option as below.

To reduce High-water mark the below command can be used


db2 "ALTER TABLESPACE <Tablespace name> LOWER HIGH WATER MARK"

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 7 of 11


and Performance Improvement
developerWorks® ibm.com/developerWorks/

To move free extents from DMS tablespace the below command can be used to reduce in
Gigabyte
db2 "ALTER TABLESPACE <Tablespace name> REDUCE (ALL CONTAINERS <storage amount> G)"

To move free extents from AUTOMATIC storage table the below command can be used

For all available extent movement


db2 "ALTER TABLESPACE <Tablespace name> REDUCE MAX"

For specific percentage of extent movement


db2 "ALTER TABLESPACE <Tablespace name> REDUCE <numeric percentage value> percent"

For specific amount of storage (in Gigabyte) movement


db2 "ALTER TABLESPACE <Tablespace name> REDUCE <storage amount> G

Alter tablespace reduce command actually reduce the high-water mark too so the initial command
of reducing high-water mark can omitted and only Alter tablespace reduce command can solve the
purpose.

Adaptive Compression Effect on Performance:


Adaptive compression improves I/O efficiency due to the fact that fewer rows and pages are
fetched from table however CPU cycles will be more for compression and expansion of data rows
during SELECT/UPDATE operation.

Let's check for Fetch time and CPU time for before and after Indexing activity in Table
SC1.T1.Benchmark testing with db2batch tool can be used to get the Fetch time and CPU time
during the query run before and after Index creation activity as below.

We have created one text file named db2batch_T1.txt. Contains of which is as below
--#SET ROWS_OUT 0
select * from SC1.T1;
--#SET ROWS_OUT 1
select total_cpu_time from table(mon_get_connection(null,-1)) as con where
application_name ='db2batch.exe';

We have run the below query before and after the Adaptive compression
D:\>db2batch -d SAMPLE -f db2batch_T1.txt -i complete > T1_Batch_Before.txt
D:\>db2batch -d SAMPLE -f db2batch_T1.txt -i complete > T1_Batch_After.txt

With other specific details, we got as below

T1_Batch_Before.txt:

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 8 of 11


and Performance Improvement
ibm.com/developerWorks/ developerWorks®

* Fetch Time is: 0.025205 seconds


* Elapsed Time is: 0.194815 seconds (complete)

TOTAL_CPU_TIME
--------------------
15600

T1_Batch_After.txt:
* Fetch Time is: 0.021457 seconds
* Elapsed Time is: 0.064064 seconds (complete)
TOTAL_CPU_TIME
--------------------
31201

From above result we could see that for a SELECT query, CPU time increased to almost double
post adaptive compression in the table. However fetch time decreased for compressed data rows.

Best Practice
• Compression estimation ration should be checked before any compression implementation. A
compression with very less compression ratio won't be fruitful w.r.t performance improvement
of the system.

• For existing tables and/or indexes, a specific approach should be taken for compression .If
downtime is restricted, ADMIN_MOVE_TABLE() procedure can be used for reorganization.

• Storage recovery from tablespace level to file system level must be performed to take full
advantage of the released storage. Storage saved out of compression activity would be best
utilized from File system towards other database(s) or table(s) in same or different tablespace
within same database, which the compressed table is a part of.

Acknowledgments
Special thanks to Manish Makwana for the review and advice towards writing this article.

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 9 of 11


and Performance Improvement
developerWorks® ibm.com/developerWorks/

Resources
• Learn more from Database Compression and Adaptive compression from IBM infocenter
• Stay current with developer technical events and webcasts focused on a variety of IBM
products and IT industry topics.
• Follow developerWorks on Twitter
• Get involved in the developerWorks Community. Connect with other developerWorks users
while you explore developer-driven blogs, forums, groups, and wikis.

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 10 of 11


and Performance Improvement
ibm.com/developerWorks/ developerWorks®

About the author


Somraj Chakrabarty

Somraj Chakrabarty has a bachelor's degree in electronics and communication


engineering from NIT, Durgapur. He has around 8.5 years of experience as a DB2
LUW database administrator in the finance, retail, and manufacturing domain. He has
worked in technology companies like TCS and Infosys, and is currently associated
with CAPGEMINI India as DBA Manager. He mostly supports multiple projects in
performance tuning and design areas. He is a certified Advanced DB2 Database
Administrator.

© Copyright IBM Corporation 2017


(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

DB2 ADAPTIVE COMPRESSION - Estimation, Implementation Page 11 of 11


and Performance Improvement

You might also like