Professional Documents
Culture Documents
DB2 Adaptive compression is a technique to compress database not only in row level but up to
page level, which made it unique so as to compress to its best possible. This article describes
the compression estimation to decide go or no go for a particular database table to compress,
Implementation of the adaptive compression and finally the performance improvement
achieved post implementation. My Paper consists of detailed steps of compression estimation,
implementation of adaptive compression and performance review of selective queries pre and
post compression activity.
Adaptive compression obeys classic compression rule at beginning. First step of which is classic
dictionary creation. The STATIC dictionary creation actually searches for frequent data across the
entire table and replaces those with a specific symbol, thus saves storage to a good extent.
Static dictionary creation can be done with Classic or Offline REORG. An alternative way called
Automatic compression can also be activated which creates dictionary automatically when table
data size grows to a certain threshold level of 1 to 2 MB. When Adaptive compression is activated
for a table with pre-existing data, the data won't be compressed in Table level automatically, but
only to page level. To compress the table level data, a full offline REORG is needed.
The below figure (Figure 1) represent a simple table with 6 rows and repetitive data pattern and
table level compression. This figure also indicates how table level compression dictionary looks
like to its simplest form.
compressed data and vice versa. If compress option is YES for an index, Index pages will also be
compressed. The additional processing by the REORG process i.e. creating the new compression
dictionary and compress all the existing table data requires additional resources as extra CPU
cycle and I/O.
• RESETDICTIONARY
When Offline REORG will run with RESETDICTIONARY clause for a table with COMPRESS
YES attribute, a new dictionary is created if the same is not existed or replaced by new dictionary
if it is already available. Data within table gets compressed according to the new compression
dictionary. Due to some specific requirements if table attribute is changed to COMPRESS NO,
then subsequent Offline REORG with RESETDICTIONARY will remove the existing compression
dictionary and all the table data become uncompressed. For a table where data is not already
compressed, REORG BUILD phase compress the data. For a table where data is already
compressed, then the data will be uncompressed, then check the new dictionary format and
compressed again according to the new dictionary during REORG BUILD phase.
• KEEPDICTIONARY
When Offline REORG will run with KEEPDICTIONARY clause for a table with COMPRESS
YES attribute, a new dictionary is created if the same is not existed but no new dictionary is
created if it is already available, rather the old dictionary will remain available. Data within table
gets compressed according to the same old compression dictionary. Due to some specific
requirements if table attribute is changed to COMPRESS NO, then subsequent Offline REORG
with KEEPDICTIONARY will remove the existing compression dictionary and all the table data
become uncompressed. Table data will be compressed during REORG BUILD phase.
not be taken into account during static dictionary creation. Page data can also be changed just
after static dictionary creation due to insert/update operation and static dictionary missed the data
pattern. Page level dictionary contains only repetitive pattern within the same page. As in Adaptive
compression, page level compression happens over and above already compressed table level
compressed data so Adaptive compression algorithm checks whether compression dictionary
creation for each page will provide more compression or not. Accordingly page level compression
dictionary is created and inserted into the specific page and remain there as special record.
Day to day INSERT, UPDATE, DELETE operation may introduce or diminish specific repetitive
data and thus that data pattern, but page level dictionary is updated if and only if there is
clear possibility of high compression can be achieved and there won't be any effect in system
performance due to dictionary update every now and then. If there is very rare repetitive data
within a page such that keeping dictionary and data pattern will actually consume more storage
than uncompressed page, then neither compression dictionary is created nor the page gets
compressed. This is similar as of table level compression where a row may or may not be
compressed depending upon the data pattern.
The below figure (Figure 2) represent a simple table with 6 rows and repetitive data pattern with
page level compression of row level compressed table. This figure also indicates how page level
compression dictionary looks like to its simplest form.
• RESETDICTIONARY:
This is the default option for loading data in column organized table with LOAD REPLACE. This
ensures building of new column dictionaries.
• KEEPDICTIONARY:
This option needs to be used for loading a table (contains data already) with LOAD INSERT
option. This can also be used for LOAD REPLACE option if previously created column dictionaries
need to be kept intact. The newly inserted data will be compressed according to available
dictionary and ANALYZE phase won't be processed.
• RESETDICTIONARYONLY:
This option is unique and can be used for column organized tables only. In this option column
dictionaries will be created according to the input data but without loading any table row. So
Column dictionaries will be built before any insertion.
Starting from DB2 V10.5 Fixpack 1 ADC is achievable. ADC uses a small sample of data to create
column dictionaries which actually deteriorated over time in terms of effectiveness. Page level
ADC enabled since DB2 V10.5 Fixpack 4. Efficient compression results for column organized
tables can be checked with SYSCAT.TABLES system catalog view. PCTPAGESSAVED column
of SYSCAT.TABLES provide estimation of database storage saving w.r.t uncompressed data in
column organized table. RUNSTATS utility needs to be used to collect the information.
The output file T1.COMPRESS.out will be generated in db2 diagnostics log path. However the
output file needs to be formatted to have human readable view with DB2INSPF command as
below
DB2INSPF T1.COMPRESS.out T1.COMPRESS.txt
In DB2 10.1 and later versions, compression estimation can be done with
ADMIN_GET_TAB_COMPRESS_INFO() administrative function. This function can be used to estimate all
the tables within a database and/or schema and for a single table also.
This function can be leveraged as below for compression estimation for both classic row
compression and Adaptive row compression
SC1 T1 0
25 33 16
42 14
1 record(s) selected.
• ROWCOMPMODE is Blank which means there is no kind of compression activated for this
table.
• PCTPAGESSAVED_CURRENT is 0 means Current percentage of pages saved from row
compression is 0 as there is no compression activated.
• AVGROWSIZE_CURRENT is 25 means Current average record length is 25.
• PCTPAGESSAVED_STATIC is 33 means Estimated percentage of pages saved from Classic
Row Compression is 33%.
• AVGROWSIZE_STATIC is 16 means Estimated average record length from Classic Row
Compression is 16.
• PCTPAGESSAVED_ADAPTIVE is 42 means Estimated percentage of pages saved from
Adaptive Row Compression is 42%.
• AVGROWSIZE_ADAPTIVE is 14 means Estimated average record length from Adaptive Row
Compression is 14.
This result clearly show the advantage of compression over uncompressed table along with
the fact that implementation of adaptive compression would result better than the classic row
compression. More is the percentage, more will be the compression ratio achieved.
In DB2 Version 10.1, a table, enabled with adaptive compression can be created with any of the
below queries
CREATE TABLE <Table Name> COMPRESS YES ADAPTIVE
CREATE TABLE <Table Name> COMPRESS YES
An existing table can also be altered to be enabled for adaptive compression with any of the below
queries
ALTER TABLE <Table Name> COMPRESS YES ADAPTIVE
ALTER TABLE <Table Name> COMPRESS YES
ADAPTIVE option is the default value for the COMPRESS YES clause because of its default
behavior and the create or alter command holds well.
A database upgraded to V10.1 or higher version still holds table compression setting if the tables
already compressed with classic row compression. Those tables need to be altered to activate
adaptive compression. However existing table rows will not be immediately compressed. Offline
table reorganization is needed to compress all existing rows in that table. The Offline table
reorganization with the RESETDICTIONARY parameter needs to be run to achieve the maximum
possible compression. ADMIN_MOVE_TABLE procedure can also be performed rather than
Offline table reorganization to avoid downtime.
ADMIN_MOVE_TABLE() procedure can be used to move data from a table to another residing in
different storage area while data is accessible. The ADMIN_MOVE_TABLE() builds compression
dictionary (dictionary will be rebuilt if already exists) and insert the dictionary to target table. After
inserting the dictionary, copy phase started from source to target table. As a consequence all the
inserted data compressed according to the compression dictionary in Target table.
For the table SC1.T1, the procedure can be used as below for data movement
CALL SYSPROC.ADMIN_MOVE_TABLE('SC1','T1', '','','','','','','','','MOVE')
Keeping these free pages under the same tablespace possession would result into inefficient
usage of storage because tables residing in other tablespaces and/or other databases can't use
the storage.
To overcome this situation storage recovery should be performed to move the free pages (extents)
from tablespace to filesystem level.
DBA can perform this activity with lowering high-water mark and altering the tablespace with
different option as below.
To move free extents from DMS tablespace the below command can be used to reduce in
Gigabyte
db2 "ALTER TABLESPACE <Tablespace name> REDUCE (ALL CONTAINERS <storage amount> G)"
To move free extents from AUTOMATIC storage table the below command can be used
Alter tablespace reduce command actually reduce the high-water mark too so the initial command
of reducing high-water mark can omitted and only Alter tablespace reduce command can solve the
purpose.
Let's check for Fetch time and CPU time for before and after Indexing activity in Table
SC1.T1.Benchmark testing with db2batch tool can be used to get the Fetch time and CPU time
during the query run before and after Index creation activity as below.
We have created one text file named db2batch_T1.txt. Contains of which is as below
--#SET ROWS_OUT 0
select * from SC1.T1;
--#SET ROWS_OUT 1
select total_cpu_time from table(mon_get_connection(null,-1)) as con where
application_name ='db2batch.exe';
We have run the below query before and after the Adaptive compression
D:\>db2batch -d SAMPLE -f db2batch_T1.txt -i complete > T1_Batch_Before.txt
D:\>db2batch -d SAMPLE -f db2batch_T1.txt -i complete > T1_Batch_After.txt
T1_Batch_Before.txt:
TOTAL_CPU_TIME
--------------------
15600
T1_Batch_After.txt:
* Fetch Time is: 0.021457 seconds
* Elapsed Time is: 0.064064 seconds (complete)
TOTAL_CPU_TIME
--------------------
31201
From above result we could see that for a SELECT query, CPU time increased to almost double
post adaptive compression in the table. However fetch time decreased for compressed data rows.
Best Practice
• Compression estimation ration should be checked before any compression implementation. A
compression with very less compression ratio won't be fruitful w.r.t performance improvement
of the system.
• For existing tables and/or indexes, a specific approach should be taken for compression .If
downtime is restricted, ADMIN_MOVE_TABLE() procedure can be used for reorganization.
• Storage recovery from tablespace level to file system level must be performed to take full
advantage of the released storage. Storage saved out of compression activity would be best
utilized from File system towards other database(s) or table(s) in same or different tablespace
within same database, which the compressed table is a part of.
Acknowledgments
Special thanks to Manish Makwana for the review and advice towards writing this article.
Resources
• Learn more from Database Compression and Adaptive compression from IBM infocenter
• Stay current with developer technical events and webcasts focused on a variety of IBM
products and IT industry topics.
• Follow developerWorks on Twitter
• Get involved in the developerWorks Community. Connect with other developerWorks users
while you explore developer-driven blogs, forums, groups, and wikis.