You are on page 1of 29

IBM DB2 for Linux, UNIX, and Windows

Best Practices Deep Compression


Thomas Fanghaenel DB2 software development

Last updated: 2012-02-01

Deep Compression

Page 2

Disclaimer ............................................................................................................. 3 Executive Summary............................................................................................. 3 Introduction .......................................................................................................... 4 Alternate Row Format w/ Value Compression ............................................... 5 Row Compression................................................................................................ 6 Classic Row Compression ............................................................................ 7 Adaptive Row Compression ........................................................................ 8 Identifying Candidate Tables for Row Compression ............................... 9 Estimating Storage Space Savings for Row Compression ..................... 12 Index Compression............................................................................................ 16 Identifying Candidate Indexes for Index Compression......................... 16 Estimating Storage Space Savings for Index Compression ................... 17 Adoption Strategies ........................................................................................... 18 Applying Compression to Existing Data.................................................. 18 Managing Disk Space Consumption......................................................... 19 Verifying Row Compression and Avoiding Common Pitfalls.............. 22 Verifying Index Compression .................................................................... 23 Compression of Temporary Tables ................................................................. 23 Compression of Backup Images and Log Archives ...................................... 24 Backup Compression................................................................................... 24 Log Archive Compression .......................................................................... 25 Conclusion .......................................................................................................... 26 Further reading................................................................................................... 27 Notices ................................................................................................................. 28 Trademarks ................................................................................................... 29

Deep Compression

Page 3

Disclaimer
IBMs statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBMs sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Executive Summary
The objective of this document is to communicate best practices for the use of DB2s Storage Optimization Feature, and illustrate how this feature can be a crucial driver of a broader space-conscious storage strategy for your large OLTP workload or data warehouse. The Storage Optimization Feature allows you to apply compression on various types of persistently stored data as well as temporary data. The benefits you can realize from using these best practices include the following: Your data consumes less storage space Your storage is consumed at a slower rate Your database performance improves, depending on your environment.

Deep Compression

Page 4

Introduction
DB2 for Linux, Unix and Windows provides various means for you to control, manage and reduce the storage consumption of objects in your database by means of compression. Compression can be applied to primary user data (row and XML data, indexes), systemgenerated data (temporary tables), and administrative data (backup images and archived transaction logs). Advanced compression features for primary user and systemgenerated data are available via the Storage Optimization Feature. Compression facilities for administrative data are available in all editions of DB2. This document presents the various compression methods integrated into DB2. For each method, it lists points worth considering when you determine whether or not you should adopt any of those compression techniques. You are provided with detailed information about how you identify candidate tables and indexes for compression, and what the best practices are to achieve maximum storage space savings when you first adopt compression techniques.

Deep Compression

Page 5

Alternate Row Format w/ Value Compression


Starting in DB2 for Linux, Unix and Windows Version 8 you can choose between two different row formats for your user-created tables. The row format determines how rows are packed when they are stored on disk. There is a standard and an alternate row format, and they can differ significantly in terms of their storage space requirements. The alternate row format allows for more compact storage of NULL and system default values as well as zero-length values in columns with variable-length data types (VARCHAR, VARGRAPHIC, LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, DBCLOB, and XML). This concept is referred to as NULL and default value compression or value compression in short. The row format can be selected individually for each table. To create a table that uses the alternate row format, use the following statement: CREATE TABLE VALUE COMPRESSION To change the row format used in an existing table, use the following statements: ALTER TABLE ACTIVATE VALUE COMPRESSION ALTER TABLE DEACTIVATE VALUE COMPRESSION In order to determine whether or not a table uses the alternate row format, query the COMPRESSION column in SYSCAT.TABLES. You will see a value of V or B if the table uses the alternate row format, the latter one indicating that row compression is also enabled on that table. If you activate or deactivate value compression for an existing table, the existing data wont be modified, and the rows remain in their current row format. That is, unless you apply any of the adoption strategies listed further down in this document. In the standard row format, space for fixed-length column values is still allocated even if the actual stored value is NULL. Similarly, zero-length values stored in columns with variable-length data types still require a small overhead. On the other hand, in a table that uses the alternate row format NULL values in all columns and zero-length values in columns with variable-length data types consume very minimal space. For tables that use the alternate row format, additional space savings can be generated by enabling default value compression for fixed-length numeric and character columns. This will result in system default values (0 for numeric columns, blank for fixed-length character columns) not being materialized in the on-disk representation of the row. Compression of system default values has to be enabled for each column individually by specifying the COMPRESS SYSTEM DEFAULT column option for that column in a CREATE TABLE or ALTER TABLE ALTER COLUMN statement.

Deep Compression

Page 6

When compared to the standard row format, the alternate row format also reduces the storage overhead for all other values in columns with variable-length data types. At the same time, the storage consumption for all non-NULL values stored in fixed-length columns increases. Formulas to determine the byte counts for values of all supported data types and row formats can be found in the reference material. While the standard row format is a good choice in most cases, there are some characteristic tables for which the alternate row format yields a more compact storage layout. Sparsely populated tables, i.e. tables that contain many rows with many NULL or system default values should use the alternate row format. You should, however, be aware that storage space requirements increase for a row whenever a NULL or system default value is subsequently updated to a value other than NULL or the system default. This generally causes pointer-overflow pairs to occur even in tables with no variable-length columns, i.e. tables in which all rows have the same space requirements when the standard row format is used. Tables having variable-length data types for the vast majority of their columns should use the alternate row format.

Using the alternate row format may, for some tables that dont fit those characteristics, result in increased storage space requirements. A test in your environment with your specific data may be worthwhile. Unlike for some of the advanced compression features, there is no additional processing overhead associated with either row format. Hence, value compression can be used even in situations where your workload is largely CPUbound. You can use the alternate row format regardless of whether or not you have licensed the Storage Optimization Feature. This allows you to choose the most compact storage layout for each table, even if you are not planning on using row compression. On the other hand, if you are planning on using row compression, choosing the more compact row format to start with still yields a smaller on-disk footprint of the table in most cases, though the effect will be less dramatic or even negligible. This is due to the fact that row compression can compress the materialized NULL and system default values in tables with the standard row format very well.

Row Compression
Row compression was first introduced in DB2 Version 9.1. Significant improvements to the row compression functionality have been made in every release since, culminating in the next-generation adaptive compression functionality in DB2 Galileo. Row compression requires a license for the Storage Optimization Feature. Storage space savings from compression translate into fewer physical I/O operations that will be necessary to read the data in a compressed table, since the same amount of rows is stored on fewer pages. It will also result in a higher buffer pool hit ratio, since more

Deep Compression

Page 7

data rows can be packed into the same number of pages. In many cases, this translates into higher throughput and faster query execution times. Row compression can be enabled individually for each table. Enabling row compression results in space savings for the vast majority of practical tables. Typical compression savings range from about 50% to 80% or more. Furthermore, it is guaranteed that the storage footprint for a table that uses row compression will not exceed that of the uncompressed version of the same table. Starting in DB2 Galileo, there are two flavors of row compression. Classic row compression refers to the same row compression technology that has been used for user table data since DB2 Version 9.1, and for XML and temporary data since DB2 Version 9.7. Adaptive row compression is a new compression mode in DB2 Galileo that can be applied to user table data. Adaptive row compression is superior to classic row compression in that it generally achieves better compression and requires less database maintenance to keep the compression ratio near an optimal level.

Classic Row Compression


The classic row compression functionality introduced in DB2 Version 9.1 is based on a dictionary-based compression algorithm. There is one compression dictionary for each table object. The dictionary contains a mapping of patterns that frequently occur in rows throughout the whole table. This compression dictionary is also referred to as the tablelevel compression dictionary. In order for row compression to occur, the table must be enabled for compression, and a dictionary must exist for the table/object. To enable a table for classic row compression in DB2 Galileo at table creation time, use the follwing statement: CREATE TABLE COMPRESS YES STATIC If you would like to change the compression mode for an existing table, use the following: ALTER TABLE COMPRES YES STATIC Note that in DB2 Galileo the STATIC qualifier in the COMPRESS YES clause is mandatory in both cases. In earlier versions of DB2 you would use the COMPRESS YES clause without any further qualification, as follows: CREATE TABLE COMPRESS YES ALTER TABLE COMPRESS YES To determine whether a table has classic row compression enabled, query the COMPRESSION column in SYSCAT.TABLES. You will see a value of R or B for tables that have row compression is enabled, with the latter indicating that the table is also using the alternate row format. In DB2 Galileo, you also need to query the

Deep Compression

Page 8

ROWCOMPMODE column. For tables with classic row compression enabled, this will contain a value of S. In DB2 version 9.1, you had to be very conscious of whether or not compression dictionaries existed, and if necessary actively initiate their creation by means of a classic table reorganization or by running the INSPECT utility. The building process for the table-level dictionary has become more automatic with the introduction of automatic dictionary creation (ADC) in DB2 version 9.5. Thereby, whenever a table that is enabled for compression exceeds a size of 1MB, the next insertion will trigger a dictionary build. Note, however, that no compression is applied to existing rows only subsequent insertions or updates will produce compressed rows. Since DB2 version 9.5, ADC and classic table reorganization are the two primary means of building table-level compression dictionaries. The two approaches, however, can differ in terms of the compression ratios that they yield. ADC creates the compression dictionary based on the data that exists at the time the dictionary is created. If the table is expected to grow or change, only a small subset of the data that will eventually be stored in that table exists when the dictionary is built. Hence, pattern detection can only be performed on that small amount of data. On the other hand, during a classic table reorganization, the entire table is scanned, and an evenly distributed sample of records from the entire table is used to build the dictionary. As a consequence, dictionaries built with ADC might, over time, t yield lower space savings than those built during a classic table reorganization The same challenge exists for a table whose data is frequently updated. Over time, the table-level dictionary may no longer contain the most efficient patterns for the changing data, and lead to a degradation of the compression ratio over time. In such cases, only regular classic table reorganizations can maintain a consistently high storage savings.

Adaptive Row Compression


In DB2 Galileo a new row compression flavor is introduced. This adaptive compression technique not only yields significantly better compression ratios in many cases, it also has the ability to adapt to changing data characteristics over time. To create a table that has adaptive row compression enabled, you use one of the following statements in DB2 Galileo: CREATE TABLE COMPRESS YES ADAPTIVE To enable adaptive row compression on an existing table, use one of the following: ALTER TABLE COMPRESS YES ADAPTIVE In DB2 Galileo, the default type of row compression is adaptive compression. Whereas to enable classic row compression, you are required to include the keyword STATIC with the COMPRESS YES clause, for adaptive compression, specifying COMPRESS YES by itself is sufficient:

Deep Compression

Page 9

CREATE TABLE COMPRESS YES ALTER TABLE COMPRESS YES In DB2 Galileo, those two statements will enable adaptive row compression. In order to determine whether a table is enabled for adaptive row compression, query the COMPRESSION and ROWCOMPMODE columns in SYSCAT.TABLES. You will see a value of R or B in the former, and a value of A in the latter of the two. Note that when you upgrade a database from DB2 version 9.8 or an earlier release, existing tables that had classic row compression enabled will keep their compression settings. If you want to enable them for adaptive row compression, you will have to use the ALTER TABLE statement shown above. Adaptive compression builds on top of the classic row compression mechanism, i.e. table-level compression dictionaries are still being used. The table-level dictionary is complemented by page-level compression dictionaries, which contain entries for frequent patterns within a single page. The table-level dictionary helps eliminate repeating patterns on a global scope, while page-level dictionaries take care of locally repeating patterns. Page-level dictionaries are maintained automatically. When a page of memory become filled with data, the database manager builds a page-level compression dictionary for the data in that page. The database manager automatically determines when rebuilds the dictionary for pages where data patterns have changed significantly.. The result is not only higher overall compression savings. The adaptive nature of the algorithm ensures that compression rates do not degenerate as much over time as with classic row compression. In many practical cases, the compression rate remains nearly constant over time. Thus, by using adaptive compression you can significantly reduce the cost associated with monitoring compression ratios and performing maintenance (that is, classic, offline table reorganization) on the database to improve storage utilization.. It is important to note that performing a classic table reorganization will still give the best possible compression. However, the improvements that a classic table reorganization can provide are less significant, and very often even negligible.

Identifying Candidate Tables for Row Compression


You need to examine your database to determine which tables within the database might be candidates for compression. Data compression is about saving storage initially (on existing uncompressed tables) and about optimizing storage growth in the future. Your storage pain points can be found in existing tables within your database, in tables where you anticipate increased growth over time, or both. Naturally, the largest tables will be easy candidates for selection to be compressed, but do not overlook smaller tables. If you have hundreds or thousands of smaller tables, you may find a benefit resulting from the aggregate effect of compression over many smaller

Deep Compression

Page 10

tables. Large and small tables are relative terms: Your database design will determine whether tables of a million or several million rows are large or small. The following example uses the recommended SQL administration function ADMIN_GET_TAB_INFO to return an ordered list of all your table names and table data object sizes for a particular schema. To display the current compression settings and the row compression mode that is in use, join the result set with SYSCAT.TABLES, and select the COMPRESSION and ROWCOMPMODE columns. SELECT SUBSTR(T.TABSCHEMA, 1, 10) AS TABSCHEMA, SUBSTR(T.TABNAME, 1, 10) AS TABNAME, SUM(TI.DATA_OBJECT_P_SIZE)/1024/1024 AS STORAGESIZE_GB, T.COMPRESSION AS COMPRESSION, T.ROWCOMPMODE AS ROWCOMPMODE FROM TABLE (SYSPROC.ADMIN_GET_TAB_INFO('DB2INST1', '')) TI JOIN SYSCAT.TABLES T ON T.TABSCHEMA = TI.TABSCHEMA AND T.TABNAME = TI.TABNAME GROUP BY T.TABSCHEMA, T.TABNAME, T.COMPRESSION, T.ROWCOMPMODE ORDER BY STORAGESIZE_GB DESC The result of this query identifies the top space consumers in your current database. This gives you a list of candidate tables that you should start working with. In this example, all the tables are currently not compressed, and they all use the standard row format. TABSCHEMA ---------DB2INST1 DB2INST1 DB2INST1 DB2INST1 DB2INST1 DB2INST1 TABNAME STORAGESIZE_GB COMPRESSION ROWCOMPMODE ---------- -------------- ----------- ----------ACCTCR 14 N BKPF 12 N BSIS 12 N CDCLS 10 N CDHDR 9N COSP 7N

6 record(s) selected. In many practical scenarios, the bulk of the storage space of an entire database is occupied by relatively few tables. When using classic row compression, small tables under a few hundred kB may not be good candidates for compression because the space savings that can be achieved may not offset the storage requirements of the data compression dictionary (approximately 100 kB). The dictionary is stored within the physical table data object itself. As a rule of thumb, consider compressing small tables which you expect to grow to 2 MB and larger. With adaptive compression no such considerations are necessary. Page-level dictionaries are being built even if the table-level dictionary does not exist. Space savings are achieved from the very beginning, even for very small tables. Once you have determined candidate tables based on their storage consumption, you should consider the typical SQL activity against the data in those tables. Tables that are read-only are great candidates for row compression. If the table experiences only some updates, it is likely to be a good candidate. Tables that undergo heavy updates may not

Deep Compression

Page 11

be as good candidates for row compression. Tables with a read/write ratio of 70% reads or higher and 30% writes or lower are excellent candidates for row compression. Whether it is better to use classic or adaptive row compression doesnt so much depend on the data access patterns as on the actual compression savings that can be achieved by either flavor. This decision will be made later on in the process. Consider using a test environment to benchmark or assess performance when compression is in effect in your environment. Complex queries tend to manipulate rows sequentially. With row compression, every data page can now contain more rows. Thus, queries on a compressed table need fewer I/O operations to access the same amount of data. Moreover, since rows remain compressed when the pages that contain them are loaded into the buffer pool, buffer pool hit ratios are improved with more rows residing on each data page. This leads to performance improvements in general. For queries that handle data in a less sequential fashion because the result set is made up of few rows (as in an OLTP environment), the performance benefits of row compression might not be quite as great. Row compression performs its best in I/O or memory bound environments, i.e. where the workload is not bottlenecked on the CPU. With row compression, extra CPU cycles are required to perform row compression and expansion of data rows whenever they are accessed or modified. This overhead can be offset by efficiencies gained in doing fewer I/O operations. This feature works very well on decision support workloads composed of complex analytic queries that perform massive aggregations, i.e. where row access is mostly sequential and less random. Following the implementation of row compression, user data written to log records as a result of INSERT, UPDATE, and DELETE activity will generally be smaller. However, it is possible that some UPDATE log records are larger when compressed than when not using compression. When a compressed row is updated, even if the uncompressed record lengths do not change, the compressed images can. Therefore, there may be fewer occasions for the database manager to use certain logging-related optimizations that are only possible if the record length remains the same. Besides compression, there are other factors that affect log space consumption for UPDATE operations, such as whether or not the updated table is enabled for replication (via DATA CAPTURE CHANGES), whether or not queries use currently-committed semantics, and where the updated columns are positioned in the table. The DB2 Galileo Information Center has some more information on column ordering and its effects on update logging. Starting in DB2 Galileo, you can also reduce the amount of space that is taken up by log archives by compressing them. This feature is independent of whether or not row compression is enabled, and it does not affect the actual amount of data that needs to be logged. Log archive compression is described later in this document.

Deep Compression

Page 12

Estimating Storage Space Savings for Row Compression


Once you have determined the list of candidate tables for compression, based on storage consumption and data access characteristics, it is time to determine how much space savings you can expect to achieve. DB2 provides means for you to answer that question individually for each of the candidate tables by estimating the expected storage space savings. You can estimate the storage space savings before you enable the tables for row compression, and you can even perform this task if you do not have a license for the Storage Optimization Feature. As with the compression functionality in general, the mechanisms for compression estimation have evolved over time, always aiming to provide you with quicker and simpler ways to perform this task. The mechanism of choice depends on which version of DB2 you are using. All the tools and functions available in older releases are also available in DB2 Galileo, but you may find them more cumbersome to use, or slower to respond with the desired data.

Compression Estimation in DB2 Galileo


In DB2 Galileo the preferred way of estimating compression ratios is by using the administrative function ADMIN_GET_TAB_COMPRESS_INFO. This function can be used to estimate compression savings for one given table, for all tables in a given schema, or for all tables in the database. The function takes the schema and table names as (optional) input parameters, and calculates current compression savings as well as savings projections for classic and adaptive compression. The following example returns the values for all tables in schema DB2INST1: SELECT SUBSTR(TABNAME,1,10) AS TABNAME, PCTPAGESSAVED_CURRENT, PCTPAGESSAVED_STATIC, PCTPAGESSAVED_ADAPTIVE FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('DB2INST1', '')) The table and schema names are optional parameters, and can be empty or NULL, in which case the function will compute the estimates for all tables in the given schema (when the table name is not specified) or all tables in the database (if both the table and schema names are not specified). The result columns will give you estimates of the current compression savings (PCTPAGESSAVED_CURRENT), the optimal compression savings that can be achieved with classic row compression (PCTPAGESSAVED_STATIC), and the optimal compression savings that can be achieved with adaptive row compression (PCTPAGESSAVED_ADAPTIVE). Note that if your schema or database contains hundreds or thousands of tables, the processing time may be substantial. If this is the case, restrict your queries to compute only estimates for those tables that you are considering to compress.

Deep Compression

Page 13

The result of this query can be used to determine whether it is advisable to enable row compression, and which row compression mode to choose. For example, consider the following result set of the above query: TABNAME PCTPAGESSAVED_CURRENT PCTPAGESSAVED_STATIC PCTPAGESSAVED_ADAPTIVE ---------- --------------------- -------------------- ---------------------ACCTCR 0 68 76 BKPF 0 83 90 BSIS 0 82 89 CDCLS 0 11 17 CDHDR 0 70 73 COSP 0 87 93 6 record(s) selected. From the six tables for which the compression estimates have been computed, five show very good compression potential. Only table CDCLS compresses very poorly the compression savings might not justify enabling compression on this particular table. The rest of the tables are very good candidates for row compression. Calculating the differences between PCTPAGESSAVED_STATIC and PCTPAGESSAVED_ADAPTIVE helps you to determine the best choice for the row compression mode. In the above example, the estimated values indicate that adaptive compression would only reduce the storage size for table CDCLS by another 10% over classic row compression. You may not want to enable adaptive compression for this table, unless you expect the data characteristics to change significantly over time, or you expect lots of new data to be inserted in the future. For the rest of the tables, adaptive compression may be the better choice because of the significant additional storage space savings of up to 50%. Note that the scale of PCTPAGESSAVED is not linear. For example, consider the CDCLS and COSP tables. For CDCLS, we expect 11% space savings with classic and 17% space savings with adaptive row compression, respectively. For COSP, the estimated savings are 87% with classic and 93% with adaptive row compression. While the difference is six percentage points in both cases, the relative savings that adaptive row compression can achieve over classic row compression are different: Assume each of the tables is 100GB in size when compression is not enabled. For CDCLS, the estimated size of the table would be 89GB with classic and 83GB with adaptive row compression, which is a difference of about 7%. On the other hand, the estimated sizes for COSP are 13GB with classic and 7GB with adaptive row compression, which is a difference of about 46%. The values in the PCTPAGESSAVED_CURRENT and AVGROWSIZE_CURRENT columns will give you an indication of how well your current data compresses, given that you have compression enabled for a given table. The measurements for the current savings are similar to what RUNSTATS would calculate for the PCTPAGESSAVED and AVGROWSIZE columns in SYSCAT.TABLES, except that they are always up to date. In other words, you dont need to perform RUNSTATS to get an accurate picture for what you are currently saving. This can help you decide what tables to switch from classic to adaptive row compression when you upgrade your database to DB2 Galileo, and it can help monitor the compression ratio over time.

Deep Compression

Page 14

The processing time of the ADMIN_GET_TAB_COMPRESS_INFO in DB2 Galileo has been significantly reduced, compared to the similar administrative functions in earlier versions of DB2. The database manager now performs scan sampling, which drastically reduces the amount of data that needs to be read before an accurate prediction can be made. You will experience the most dramatic improvements with very large tables.

Compression Estimation in DB2 Versions 9.5 and 9.7


If you are using DB2 version 9.5 or 9.7, administrative functions similar to the above are available, though they have slightly different names and signatures. In DB2 version 9.7, the function is called ADMIN_GET_TAB_COMPRESS_INFO_V97, and in DB2 version 9.5 its name was ADMIN_GET_TAB_COMPRESS_INFO. Unlike the function in DB2 Galileo that takes two input parameters, these functions take three. The third parameter is the execution mode, and for compression estimation you would pass the string ESTIMATE as a value for this parameter. The result set returned also differs significantly from the result set of the DB2 Galileo function. For the same set of tables as the above example, you would perform the compression estimation in DB2 version 9.5 as follows: SELECT SUBSTR(TABNAME,1,10) AS TABNAME, PAGES_SAVED_PERCENT FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('DB2INST1', '', 'ESTIMATE')) TABNAME PAGES_SAVED_PERCENT ---------- ------------------ACCTCR 68 BKPF 82 BSIS 81 CDCLS 11 CDHDR 70 COSP 87 6 record(s) selected. The processing time for this function may be significantly longer than in DB2 Galileo due to the fact that a full table scan is performed for each table. Similar to what you would do in DB2 Galileo, the results returned by the administrative function can help you make the decision about whether or not to enable row compression, and on which of your candidate tables. You would also use those administrative functions to monitor the development of compression ratios over time, to help you decide when a classic table reorganization may be necessary to improve compression. When you do that, do not compare the results of the REPORT and the ESTIMATE modes. The REPORT mode does not return the current compression savings, instead it gives information about the compression savings at the time when the dictionary was built, which may have been a long time ago. Instead, make sure your statistics are current, and compare the

Deep Compression

Page 15

PERCENTAGE_PAGES_SAVED value returned by the administrative function in ESTIMATE mode with the value of the PCTPAGESSAVED column in SYSCAT.TABLES for the same table. If the latter is significantly lower than the estimated compression savings, this may be an indication that you should reorganize the table and have a new table-level dictionary built by that process.

Compression Estimation in DB2 Version 9.1


The first version of DB2 that supported row compression did not provide any of the administrative functions that were introduced in later releases. However, you can perform compression estimation by running the INSPECT ROWCOMPESTIMATE utility. INSPECT ROWCOMPESTIMATE TABLE NAME ACCTCR SCHEMA DB2INST1 RESULTS acctcr.rowcompestimate.out Running this utility produces an INSPECT output file in the diagnostic data directory path, which you will have to format with the db2inspf utility. db2inspf acctcr.rowcompestimate.out acctcr.rowcompestimate.txt In the formatted output file you can find information about the estimated compression savings in text form as follows: DATABASE: TDB2 VERSION : SQL10010 2011-11-07-18.39.23.555355

Action: ROWCOMPESTIMATE TABLE Schema name: DB2INST1 Table name: ACCTCR Tablespace ID: 4 Object ID: 8 Result file name: acctcr.rowcompestimate.out Table phase start [...] Data phase start. Object: 8 Tablespace: 4 Row compression estimate results: Percentage of pages saved from compression: 68 Percentage of bytes saved from compression: 68 Compression dictionary size: 32640 bytes. Expansion dictionary size: 32768 bytes. Data phase end. Table phase end. Processing has completed. 2011-11-07-18.40.08.349345 Note that the INSPECT utility will also insert the table-level dictionary into the target table if that table is enabled for row compression and a table-level dictionary does not exist. This allows for building a compression dictionary in DB2 version 9.1 without

Deep Compression

Page 16

performing a classic table reorganization. In later releases of DB2 you should not use the INSPECT utility to build a table-level dictionary, but use ADC instead.

Index Compression
You can apply compression to indexes if you are using DB2 version 9.7 or a later release, and have a license for the Storage Optimization Feature. Compressing indexes will result in fewer leaf pages, which in many cases will help reduce the depth of the index tree. This allows for fewer index page accesses to find a specific key, and like in case of row compression a higher utilization of the buffer pool and fewer physical I/O operations. In many practical cases index compression can significantly improve query performance. Index compression can be enabled for each index individually. When you create an index, the index inherits its compression setting from the underlying table, i.e. all indexes that are created on tables that are enabled for classic or adaptive row compression will be compressed as well. You may want to verify that your database build scripts take that into account, and either create the tables with a COMPRESS YES clause in the first place, or ALTER the tables to enable row compression before creating any indexes on them. Index compression uses a combination of three different techniques to reduce the amount of data that is stored on disk. Space savings are achieved by dynamically adjusting the number of index keys rather than reserving space for the highest possible number of keys that can be stored in a page. In addition, the index entries, each of which consists of an index key and a RID list, are compressed by eliminating redundant prefixes from the keys in a single page and applying a delta encoding to the RIDs in each list. Like adaptive row compression (and unlike classic row compression), index compression is fully automatic. Optimal compression savings are maintained over time, i.e. there is no need to monitor and potentially reorganize indexes to improve compression ratios.

Identifying Candidate Indexes for Index Compression


The best candidates for compression are indexes with long keys (e.g. multi-column indexes or indexes on character data) as well as indexes that contain many duplicate keys (e.g. indexes based on a single or very few low-cardinality columns). In addition, indexes that involve variable-length columns (i.e. VARCHAR) may benefit from the better space utilization that the dynamic index page space management offers. Multi-column indexes generally compress better if the leading key columns are low to medium cardinality, and where the higher-cardinality or unique columns appear towards the end of the key column list. Depending on the characteristics of the queries you expect your indexes to support, column reordering may or may not be possible without sacrificing query performance. Especially in data warehousing scenarios where the majority of your queries aggregate over a larger number of qualifying rows, and where query characteristics are generally more predictable than in OLTP workloads, it may be worthwhile checking the order and cardinalities for the columns of your larger indexes.

Deep Compression

Page 17

Examples that may benefit less from compression are indexes on single unique numeric column, or unique multi-column indexes that have the highest-cardinality column as the leading index column.

Estimating Storage Space Savings for Index Compression


When deciding which indexes to compress, estimation of storage space savings plays a bigger role than it does for tables. The computational overhead associated with index compression is less than for row compression, and storage space savings generally correspond more directly to overall performance improvements. If you can achieve significant space savings for any of your indexes in an OLTP workload, it is a good idea to compress them. Similar to the row compression estimation functionality, DB2 provides an administrative function to project storage space savings for index compression. This function is called ADMIN_GET_INDEX_COMPRESS_INFO. It can be used to estimate compression savings for a single index, or for all indexes on a given table, for all indexes in a given schema, or for all indexes in the database. You control the execution mode of the function by the first three of its five input parameters, i.e. the object type, schema and name. If you want to estimate space savings for a single index or for all indexes in a given schema, specify an object type of I. If you want to estimate space savings for all indexes on a given table or all tables in a schema or database, specify an object type of T (or NULL or the empty string). Depending on the value of the object type parameter, the object name and schema parameters can be used to identify the desired table or index, or be left as NULL or empty strings if you want to estimate compression savings for a range of indexes. For example, you would use the following query to obtain estimates for compression savings for all indexes on the table BKPF on all partitions. SELECT SUBSTR(INDNAME, 1, 20) AS INDNAME, COMPRESS_ATTR, PCT_PAGES_SAVED FROM TABLE(SYSPROC.ADMIN_GET_INDEX_COMPRESS_INFO('T', 'DB2INST1', 'BKPF', NULL, NULL)) The result set of this query would, for example, look like this: INDNAME -------------------BKPF~0 BKPF~1 BKPF~2 BKPF~3 BKPF~4 BKPF~5 BKPF~6 COMPRESS_ATTR PCT_PAGES_SAVED ------------- --------------N 46 N 57 N 71 N 71 N 45 N 71 N 70

Deep Compression

Page 18

BKPF~BUT

71

8 record(s) selected. It shows that most of the indexes on this table compress very well with about 70% space savings to be gained. If you invoke this function on indexes that are already compressed, the compression savings reflect the actual savings rather than estimates. Alternatively, you can obtain information about the current savings generated by index compression from the PCTPAGESSAVED column of the SYSCAT.TABLES catalog view, which is maintained by RUNSTATS.

Adoption Strategies
Once you have determined the set of tables and indexes that you are going to compress, the next step is to enable these tables for compression. Changing the compression settings for existing tables with data in them helps in slowing their growth rate, i.e. data that is being inserted will benefit from the new compression settings. If your goal is to reduce the footprint of your existing data perhaps because you want to reduce the physical space allocated by your database compression needs to be applied to this data. This topic discusses different strategies that you can employ to help you apply compression to large amounts of existing data, free up disk space that is currently consumed by that data, and release this space to the file system. You will find this information most useful when you first implement row or index compression, or when you are upgrading from earlier versions to DB2 Galileo.

Applying Compression to Existing Data


The most straightforward way to apply compression to existing data is by performing a classic table reorganization. If you changed the settings for row or value compression, perform a REORG TABLE operation with RESETDICTIONARY clause. If some of your tables have XML columns, you should specify the LONGLOBDATA clause as well to compress the non-inlined XML documents. If you only changed the compression settings for indexes, a REORG INDEXES ALL is sufficient to get the index data into compressed format. A full reorganization of the table data is not necessary. You may consider that if you upgrade to DB2 version 9.7, and already have row compression enabled. If you cannot take your tables offline, consider the ADMIN_MOVE_TABLE stored procedure. This utility is available since DB2 version 9.7, and allows you to move data thats in an active table into a new data table object with the same name, and the same or a different storage location. The data remains online for the duration of the move, and any modifications done while the move operation is ongoing will automatically be applied to the new table.

Deep Compression

Page 19

The ADMIN_MOVE_TABLE procedure provides you with a way to effectively reorganize a table in exactly the same way that a classic table reorganization would, but without causing any downtime. In particular, the table-level dictionary is built or rebuilt with the same quality as during reorganization. For that, the ADMIN_MOVE_TABLE procedure computes a Bernoulli sample of all the rows in the table, builds a new tablelevel compression dictionary from that sample, and inserts it into the target table prior to starting the copy phase, so that rows get compressed when they are being inserted into the target table during the copy phase. The following is how you initiate an online table move operation in the most straightforward way. CALL SYSPROC.ADMIN_MOVE_TABLE('DB2INST1','ACCTCR', '','','','','','','','','MOVE') The ADMIN_MOVE_TABLE_UTIL procedure allows you to exercise fine-grained control over parameters and execution of the online table move operation, such as determining the size of the sample used for dictionary building. However, even if you just invoke the function in the standard MOVE execution mode, the compression dictionary that gets built is optimal. Unlike for classic table reorganization, the default behavior for ADMIN_MOVE_TABLE is to rebuild the dictionary. Due to the fact that only a small random sample of the tables data is used, this typically does not cause an excessive performance overhead compared to not rebuilding the dictionary. You may find the ADMIN_MOVE_TABLE procedure particularly useful when you have upgraded your database from an earlier release of DB2. It can be used to move tables between table spaces, and change other parameters, such as column data types, at the same time.

Managing Disk Space Consumption


Applying compression to existing tables reduces the number of pages allocated by those tables. However, it does not immediately affect the disk space consumption of your DMS or automatic storage table spaces. Those table spaces will merely have a larger number of unused extents. When you first implement row and index compression, it is good practice to re-evaluate the utilization of your table space containers as well as the projected growth rate of your data. With compression, your table space utilization may be significantly reduced. It may take a long time until your database grows enough to consume all the space that you initially freed up, in which case you may want to consider reducing the container sizes and make more disk space available to other consumers. Effectively reducing the on-disk footprint of your database requires that you reduce the size of your table space containers. The key factor that determines by how much table

Deep Compression

Page 20

space containers can be reduced in size is the high water mark. The high water mark reflects the location of the highest extent in a table space that is allocated by a table or used as a space map entry. The high water mark may be increased when new tables are created, existing tables are extended, or when a classic table reorganization is performed that doesnt use a temporary table space. The high water mark may be reduced by dropping or truncating the table that owns the extent at the high water mark. However, dropping, truncating or reorganizing other objects does not affect the high water mark, and will merely result in creating more unused extents below the high water mark. Those unused extents below the high water mark can only be re-used whenever new objects are created, or when existing objects grow. Only unused extents above the high water mark are subject to be reclaimed, and returned to the operating system. Consequently, the key to reduce on-disk footprint is to lower the high water mark and truncate the table space containers. The remainder of this section describes the mechanisms available to perform this task. Those mechanisms depend on what kind of table spaces you are using, which version of DB2 was used to create these table spaces, and which version of DB2 you are currently using.

Reclaimable Storage in DB2 Version 9.7


Non-temporary DMS and automatic storage table spaces created in DB2 version 9.7 or later support reclaimable storage. For those table spaces, the database manager can change the physical location of extents in the containers. This mechanism is called extent movement, and can be utilized to consolidate all the unused space in table space containers to the physical end of the containers. Once that consolidation has taken place, the high water mark can be lowered, and the table space containers can subsequently be shrunk to the smallest possible size. You can use the ALTER TABLESPACE statement to lower the high water mark and shrink table space containers. The process of lowering the high water mark involves extent movement, and may thus take some time. Once the high water mark is lowered, containers can be reduced in size by returning unused extents above the high water mark to the file system. In order to determine by how much the size of a DMS or automatic storage table space can be reduced, use the MON_GET_TABLESPACE table function: SELECT SUBSTR(TBSP_NAME, 1, 15) AS TBSP_NAME, TBSP_FREE_PAGES, TBSP_PAGE_SIZE * TBSP_FREE_PAGES / 1024 /1024 AS TBSP_RECLAIMABLE_MB FROM TABLE(MON_GET_TABLESPACE('TBSP_USR_D_1', NULL)) The result set returned by this function may look like this, indicating that this particular table space can be reduced by about 23 million pages, worth 365MB in on-disk footprint:

Deep Compression

Page 21

TBSP_NAME TBSP_FREE_PAGES TBSP_RECLAIMABLE_MB --------------- -------------------- -------------------TBSP_USR_D_1 23369760 365150 1 record(s) selected. Automatic storage table spaces can be shrunk to their most compact format easily by using the following statement: ALTER TABLESPACE TS_DAT_A REDUCE MAX For DMS table spaces, lowering the high water mark and reducing containers is a twostep process that is performed by the following two statements: ALTER TABLESPACE TS_DAT_D LOWER HIGH WATER MARK ALTER TABLESPACE TS_DAT_D REDUCE (ALL CONTAINERS 100 G) Note that when you upgrade your database from an older version of DB2 to DB2 version 9.7 or a more recent release, you should create new table spaces for all your existing automatic storage or DMS table spaces, and use the ADMIN_MOVE_TABLE procedure to move your tables from the old to the new table spaces. This will ensure you get the benefits of reclaimable storage. You should enable row and index compression before moving the tables, so that they get compressed in the process of moving.

Reducing Container Sizes in DB2 Version 9.5 and Earlier


For DMS and automatic storage table spaces created with DB2 Version 9.5 and earlier, extent movement is not supported. Therefore, the high water mark can only be lowered to include the highest used extent. Space allocated for unused extents below this highest used extent cannot be reclaimed without first lowering the high water mark. You can use the db2dart utility with the /DHWM option to determine what kind of extent is holding up the high water mark, and whether the high water mark can be lowered. If it can, db2dart with the /LHWM switch will list the steps that are necessary to do so. If there is a table data object or an XML object holding up the high water mark, then performing a classic table reorganization may help lowering it. If an index object is holding up the high water mark, reorganizing the indexes for this table may help lowering it. In case the high water mark is held up by an empty SMP extent, the db2dart utility provides the /RHWM option that can help discard this SMP extent. Please note that when using db2dart to lower the high water mark, you need to deactivate your database first, and the affected table space may be placed in backuppending state, as described by Technote 1006526. If you are on DB2 version 9.5 or an earlier release and cannot upgrade to a more recent version, it is strongly recommended to use a temporary table space when you perform the classic table reorganizations to apply compression to your tables. If it is possible in

Deep Compression

Page 22

your environment, also separate the largest tables out into a separate table space before attempting to compress them. If you have many tables in the same DMS or automatic storage table space, perform the classic table reorganizations starting with the smallest tables first. This will ensure that the high water mark is grown the least, and that any unused extents that are generated below the high water mark will be used when reorganizing the next larger table.

Verifying Row Compression and Avoiding Common Pitfalls


The compression settings for existing tables can be revealed by querying the system catalog. In SYSCAT.TABLES the two columns COMPRESSION and ROWCOMPMODE contain information about the current compression settings. The COMPRESSION column indicates whether value compression or row compression is enabled. The ROWCOMPMODE column was introduced in DB2 Galileo, and provides information about whether the classic or the adaptive flavor of row compression is in use. In case you have tables that have row compression enabled, there are a few more interesting columns in SYSCAT.TABLES that allow you some deeper insight into how row compression performs. These columns are all populated by RUNSTATS. The most important metric is PCTPAGESSAVED. This column gives you a measure for how much space youre currently saving on this table by means of row compression. From PCTROWSCOMPRESSED you can determine how many rows in the table are actually compressed, and AVGCOMPRESSEDROWSIZE provides the average size of all those compressed rows. In most practical scenarios, the percentage of compressed rows should be very near 100%. In this case, the average size of compressed rows is identical to the average row size that is maintained in the AVGROWSIZE column. If the percentage of compressed rows in a table is significantly lower than 100%., this may be due to one of the following most common issues: Check whether your table resides in a regular table space. If it does, check the average row size of the table against the effective minimal row size of the table spaces page size. For regular table spaces, the effective minimal row size is limited by the fact that each page can only hold up to 255 rows. If your average row size is near or below that ratio, consider converting the regular table space to a large table space via the ALTER TABLESPACE command with the CONVERT TO LARGE option, or use ADMIN_MOVE_TABLE to move your table to a large table space. If you enable row compression on an existing table with data in it, the table-level dictionary will be built automatically by means of ADC when the table grows further. Existing data remains in its uncompressed form until you perform a classic table reorganization or you use the ADMIN_MOVE_TABLE function.

Deep Compression

Page 23

You may also encounter this situation when you create a new table with classic row compression enabled, and you start populating the table. The table-level dictionary will be built automatically when the table size reaches the ADC threshold, i.e. it gets bigger than 2MB. However, the rows in the very beginning of the table will remain uncompressed. If your table is only a few MB in size, this initial chunk of uncompressed rows may account for a significant fraction of the total rows stored in the table. If none of the above, check whether the table contains random data. For example, you may find that your table has a rather large CHAR FOR BIT DATA or inlined LOB column that contains binary data that is compressed in some form by the application layer. In such cases row compression may not be able to compress your data any further, and should not be enabled on those tables.

Verifying Index Compression


Like for row compression, you can query the system catalog to check index compression settings. The COMPRESSION column in SYSCAT.INDEXES provides information about the compression settings for each index. For those indexes that are enabled for compression, the PCTPAGESSAVED column provides you with information about the storage space savings. This column is populated by RUNSTATS. You should verify that index compression is enabled for all the indexes that you had considered for compression. Pay special attention to the system-generated indexes (e.g. indexes that are implicitly created for primary key and unique columns). If you had created your table without enabling row compression, and subsequently used ALTER TABLE to enable row compression, those indexes will be uncompressed. You have to enable index compression for those indexes explicitly by using the ALTER INDEX command.

Compression of Temporary Tables


Compression can be applied to temporary tables, starting in DB2 version 9.7, if you have a license for the Storage Optimization Feature. Unlike row and index compression, temporary tables do not need to be enabled for compression. This is done automatically, and applies to both user-defined and system temporary tables. There are different situations where temporary tables might be used. There are userdefined global temporary tables, which come in two variants: Created global temporary tables (CGTTs) and declared global temporary tables (DGTTs). Temporary tables may also be used by some utilities and maintenance operations, such as table reorganization and data redistribution. During query processing, it may be necessary for the database manager to use temporary tables for certain operators that need to accumulate intermediate results, such as sorts, hash joins or table queues.

Deep Compression

Page 24

The mechanism used to compress temporary tables is the same as the one that is used for classic row compression with ADC, though some of the runtime criteria are different. This is owed to the fact that most temporary tables, especially when they are small, do not cause any physical I/O. Thus, the threshold at which the compression dictionary is built is at 100MB instead of 2MB. This ensures that small temporary tables that typically remain fully buffered are not compressed, while larger temporary tables that may spill to disk will have compressed data. It will also ensure that data in large temporary tables will utilize the buffer pool more efficiently, which helps further in avoiding physical I/O.

Compression of Backup Images and Log Archives


Since version 8, DB2 allows you to compress on your backup images. Starting in DB2 Galileo, you can also apply compression on your log archives. These functions are available regardless of whether or not you have licensed the Storage Optimization Feature. The default algorithm used to compress backup images and archived logs is similar to the one used by the compress(1) Unix utility.

Backup Compression
Backup compression can be enabled independently for each backup image. When performing the backup operation, specify the COMPRESS clause, for example as follows: BACKUP DATABASE TDB2 to /vol_aux/backups/ COMPRESS When you use row and index compression, and you have followed the guidelines to reduce the container sizes for your table spaces, the overall size of your database is already greatly reduced. Consequently, your backup images will automatically be smaller than they would be if row and index compression were disabled. At the same time, the extra space savings you can achieve by applying compression on backup images may be greatly reduced. This particularly applies when you have many tables that use adaptive row compression. On the other hand, compression on backup images can also compress metadata, LOBs, the catalog tables, and other database objects that cannot be compressed by any other means. When deciding whether or not to compress your backup images, the most important indicators to watch for are the CPU and disk utilization rates while the backup is in progress. Backup compression can cause a significant overhead in CPU consumption. You should only use backup compression if the backup process is bottlenecked on I/O. This can happen, for example, if you store your backup images on a volume that is not striped across different physical disks, or if you back up to network-attached or non-local storage media. In those cases, you will increase CPU utilization throughout the backup process, but the savings in I/O overhead will effectively lead to a shorter backup window.

Deep Compression

Page 25

If you are storing your backups on Tivoli Storage Manager (TSM), it is recommended to use the compression and de-duplication functionality that is built into TSM. The deduplication logic can lead to significantly better overall storage space savings if you keep a series of multiple recent backup images. In case you are using a mixture of tables, some of which use row and index compressions while others dont, it is good practice to separate the tables and indexes into different table spaces, based on their compression settings. When you take backups, consider taking them on table space rather than database level. In this case you can balance the compression settings, and only compress the backups for those table spaces that contain tables and indexes that are not compressed.

Log Archive Compression


If you have configured your database for log archiving, compression for those log archives can be enabled via database configuration parameters. You can enable or disable log archive compression independently for the primary or secondary archiving methods and the corresponding locations. Log archive compression requires that the database manager handles the archiving process, i.e. your primary and/or secondary archiving method must be DISK, TSM or VENDOR. The following example shows how set a primary, disk-based log archive with compression enabled: UPDATE DB CFG FOR TDB2 USING LOGARCHMETH1 DISK:/vol_aux/archive/tdb2 UPDATE DB CFG FOR TDB2 USING LOGARCHCOMPR1 ON For the secondary archiving method, compression can be enabled in a similar fashion. Log archive compression, once enabled, is fully automatic. The database manager will automatically compress log extents when they are moved from the active log path to the archive location. Upon retrieval, i.e. during ROLLBACK and ROLLFORWARD operations, the database manager automatically expands compressed log files when they are moved from the archive into the active or overflow log paths. If, during recovery, the database manager encounters a compressed log extent in the active of overflow log paths, it will automatically expand that extent. This may happen, for example, when users manually retrieve log extents from the archive location into the active or overflow log paths. When deciding whether or not to enable compression on your log archives, similar considerations as for backup compression should be made. Check the I/O bandwidth that is available for your archive location, and see if this is the bottleneck during the process of archiving. You should consider enabling compression if your log archives are on non-local volumes or volumes that are not striped across a number of disks. In addition to that, if you are archiving your logs to an on-disk staging location for subsequent transfer onto tape via the db2tapemgr utility, you should also consider enabling compression, as it will help reduce the time that it takes to transfer the archived logs to tape.

Deep Compression

Page 26

If you are archiving to TSM, consider using TSMs built-in compression facilities. The behavior for log archives will be very similar to using the compression facilities in DB2 Galileo, as the same algorithm is used. Unlike for backup compression, there is generally no additional benefit from de-duplication. When you use row and index compression, some of the data that is written to the transaction log is already in compressed format, which effectively reduces the amount of data being logged compared to what would be necessary if the same operations had been performed on the uncompressed tables. However, even if you are widely using row and index compression, there is generally enough metadata and other uncompressed items in the transaction log so that significant storage space savings can be achieved on almost any log archive.

Conclusion
Applying the compression techniques provided by DB2s Storage Optimization Feature can significantly reduce current and future storage space requirements and improve performance, especially in the common case where your workload is I/O-bound. In conjunction with other compression features for administrative data, it is not difficult to realize a space-conscious storage strategy for your large OLTP workload or data warehouse.

Deep Compression

Page 27

Further reading
DB2 Best Practices - http://www.ibm.com/developerworks/db2/bestpractices/

Deep Compression

Page 28

Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. Without limiting the above disclaimers, IBM provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any recommendations or techniques herein is a customer responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environment do so at their own risk. This document and the information contained herein may be used solely in connection with the IBM products discussed in this document. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

Deep Compression

Page 29

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: Copyright IBM Corporation 2012. All Rights Reserved. This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.