You are on page 1of 8

Lab 03 – Storage and Data Load IBM Software

Lab 03 Storage and Data Load


Important things about LOAD for Columnar Tables
__1. The LOAD utility using REPLACE or INSERT into empty table builds column compression
dictionary. This happens during ANALYZE phase.

__2. The LOAD utility when it is in LOAD phase compresses values and data pages. It updates
synopsis table and build keys for page map index and any unique indexes.

__3. The UTIL_HEAP_SZ should be as big as possible for the LOAD process. If the database server
has more than 128GB of RAM, the value of UTIL_HEAP_SZ should be at least 4 million pages.

__4. In order to get good compression, it is necessary to load large amount of representative data.

__5. It is not advisable to load a small initial subset of data for the 1st load as it might lead to the poor
compression dictionary.

__6. Page level compression is used for new values not covered by the column level dictionary. This
reduces the need to rebuild the compression dictionary.

__7. Page level compression reduces deteriorating compression ratios over time.

Use of Load Utility


__8. In GNOME Command window, type cd3 to change the directory to Lab 03.
$ cd3

__9. Run data01 to create a DB2PSC.FACT_DX_COL column organized table.


$ ./data01

IBM DB2 10.5 BLU Acceleration Page 31


IBM Software Lab 03 – Storage and Data Load

__10. Run data02 to load 100,000 rows in it.


$ ./data02

__11. Please notice new ANALYZE phase for the column organize table.

__12. In the ANALYZE phase, data is converted from row organized format to column organized format.
Histograms are built to track value frequency. The compression dictionary is built based upon
histograms.

__13. In the LOAD phase, raw data is converted from row organized format to the column organized
format. The data pages are built using compression dictionary. The synopsis table is built.

__14. The keys for page map index and any unique indexes are built.

Page 32 An IBM Proof of Technology


Lab 03 – Storage and Data Load IBM Software

Check Percent Pages Saved


__15. Run data03 to check percent pages saved.
$ ./data03

__16. We will now load 1 million rows and check compression ratio. Run the following commands.
$ ./data04  [Same as data02 but load 1 million rows].

__17. Run data03 to check percent pages saved.


$ ./data03

__18. Please notice the compression ratio increases from 1.61 to 3.03 when number of rows
increase from 100000 to 1000000.

IBM DB2 10.5 BLU Acceleration Page 33


IBM Software Lab 03 – Storage and Data Load

Check Synopsis Table


__19. Run data05.
$ ./data05

__20. Run gedit data05.log to see the contents of the synopsis table for SAMPLE_TAB.
$ gedit data05.log

__21. There is one entry for 1024 rows and it contains the min and max values for each column. The
TSNMIN and TSNMAX (Tuple Sequence Number) is an internal reference to the actual page
holding the data.

__22. We will explore how DB2 BLU Acceleration does the data skipping using synopsis table in
Lab 05.

__23. Press CTRL-Q to quit from the gedit window.

Page 34 An IBM Proof of Technology


Lab 03 – Storage and Data Load IBM Software

Use of db2convert Utility


__24. The db2convert utility can convert one or all row-organized user tables into column-organized
tables in a specified database. The row-organized tables remain online during command
processing. Internally, the db2convert utility invokes ADMIN_MOVE_TABLE stored procedure to
convert and move the table.

__25. Run data06 to create FACT_DX_ROW as row organized table.


$ ./data06

__26. Run data07 to load 1 million rows in it.


$ ./data07

IBM DB2 10.5 BLU Acceleration Page 35


IBM Software Lab 03 – Storage and Data Load

__27. Run data11 to check amount due on providers having balance more than $2000.
$ ./data11

__28. Run data08 script runs two commands. The first command opens a new GNOME-Terminal
window which runs a workload defined in data09 against the table in parallel that we are
converting. The db2convert command converts row organized table DB2PSC.FACT_DX_ROW
into column organized while the workload is also running on the same table.

$ ./data08

__29. After balances (more than $2000) are cleared from the table, the workload will finish and it will
prompt you to press Y to close the command window.

Page 36 An IBM Proof of Technology


Lab 03 – Storage and Data Load IBM Software

__30. Watch the progress of the db2convert utility.

Note: Since we updated the table through the workload, the REPLAY phase
applies the changes using LOG records. This may take longer as we
committed every single UPDATE. After the SWAP phase, it will try to obtain a
z table lock to drop and rename the table.

This whole operation is online and it allows to convert the table without
having to take any downtime.

__31. When db2convert finishes the work, you will see message similar to the one shown below.

__32. Run data10 to check the DB2PSC.FACT_DX_ROW table.


$ ./data10

__33. Please note TABLEORG as C and compression ratio as 3.

IBM DB2 10.5 BLU Acceleration Page 37


IBM Software Lab 03 – Storage and Data Load

__34. Since we ran the workload against the row organized table (while it was converting) to clear the
balances more than $2000, we will now check the same on the converted table. Run data11.
$ ./data11

__35. Type clear in the command window.


$ clear

** End of Lab 03: Storage and Data Load

Page 38 An IBM Proof of Technology

You might also like