You are on page 1of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

DB2 Basics: An introduction to materialized query tables
Level: Introductory Roman Melnyk (roman_b_melnyk@hotmail.com), DB2 Information Development, IBM Canada Ltd. 08 Sep 2005 The definition of a materialized query table (MQT) is based upon the result of a query. MQTs can significantly improve the performance of queries. This article introduces you to MQTs, summary tables, and staging tables, and shows you, by way of working examples, how to get up and running with materialized query tables. A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data that is contained in an MQT is derived from one or more tables on which the materialized query table definition is based. Summary tables (or automatic summary tables, ASTs), which are familiar to IBM® DB2® Universal Database™ (UDB) for Linux, UNIX®, and Windows® (DB2 UDB) users, are considered to be a specialized type of MQT. The fullselect that is part of the definition of a summary table contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect. You can think of an MQT as a kind of materialized view. Both views and MQTs are defined on the basis of a query. The query on which a view is based is run whenever the view is referenced; however, an MQT actually stores the query results as data, and you can work with the data that is in the MQT instead of the data that is in the underlying tables. Materialized query tables can significantly improve the performance of queries, especially complex queries. If the optimizer determines that a query or part of a query could be resolved using an MQT, the query might be rewritten to take advantage of the MQT. An MQT can be defined at table creation time as maintained by the system or maintained by the user. The following sections introduce you to these two types of MQTs, as well as summary tables and staging tables. The examples that follow require a connection to the SAMPLE database; if you don’t have the SAMPLE database created on your system, you can create it by entering the db2sampl command from any command prompt.

Maintained by system MQTs
The data in this type of materialized query table is maintained by the system. When you create this type of MQT, you can specify whether the table data will be a REFRESH IMMEDIATE or REFRESH DEFERRED. The REFRESH keyword lets you specify how the data is to be maintained. DEFERRED means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. Neither REFRESH DEFERRED nor REFRESH IMMEDIATE system-maintained MQTs allow insert, update, or delete operations to be executed against them. However, REFRESH IMMEDIATE system-maintained MQTs are updated with changes made to the underlying tables as a result of insert, update, or delete operations. Listing 1 shows an example of creating a REFRESH IMMEDIATE system-maintained MQT. The table, which is named EMP, is based on the underlying tables EMPLOYEE and DEPARTMENT in the SAMPLE database. Because REFRESH IMMEDIATE MQTs require that at least one unique key from each table referenced in the query appear in the select list, we first define a unique constraint on the EMPNO column in the EMPLOYEE table and on the DEPTNO column in the DEPARTMENT table. The DATA INITIALLY DEFERRED clause simply means that data will not be inserted into the table as part of the CREATE TABLE statement. After being
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/ Page 1 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

simply means that data will not be inserted into the table as part of the CREATE TABLE statement. After being created, the MQT is in check pending state (see Demystifying table and table space states), and cannot be queried until the SET INTEGRITY statement has been executed against it. The IMMEDIATE CHECKED clause specifies that the data is to be checked against the MQT's defining query and refreshed; the NOT INCREMENTAL clause specifies that integrity checking is to be done on the whole table. A query executed against the EMP materialized query table shows that it is now fully populated with data. Listing 1. Creating an MQT that is to be maintained by the system
connect to sample ... alter table employee add unique (empno) alter table department add unique (deptno) create table emp as (select e.empno, e.firstnme, e.lastname, e.phoneno, d.deptno, substr(d.deptname, 1, 12) as department, d.mgrno from employee e, department d where e.workdept = d.deptno) data initially deferred refresh immediate set integrity for emp immediate checked not incremental select * from emp EMPNO -----000010 000020 000030 000050 000060 000070 000090 000100 000110 000120 000130 ... 000340 FIRSTNME -----------CHRISTINE MICHAEL SALLY JOHN IRVING EVA EILEEN THEODORE VINCENZO SEAN DOLORES JASON LASTNAME --------------HAAS THOMPSON KWAN GEYER STERN PULASKI HENDERSON SPENSER LUCCHESSI O'CONNELL QUINTANA GOUNOT PHONENO ------3978 3476 4738 6789 6423 7831 5498 0972 3490 2167 4578 5698 DEPTNO -----A00 B01 C01 E01 D11 D21 E11 E21 A00 A00 C01 E21 DEPARTMENT -----------SPIFFY COMPU PLANNING INFORMATION SUPPORT SERV MANUFACTURIN ADMINISTRATI OPERATIONS SOFTWARE SUP SPIFFY COMPU SPIFFY COMPU INFORMATION MGRNO -----000010 000020 000030 000050 000060 000070 000090 000100 000010 000010 000030

SOFTWARE SUP 000100

32 record(s) selected. connect reset

Maintained by user MQTs
The data in this type of materialized query table is maintained by the user. Only a REFRESH DEFERRED materialized query table can be defined as MAINTAINED BY USER. The REFRESH TABLE statement (used for system-maintained MQTs) cannot be issued against user-maintained MQTs. User-maintained MQTs do allow insert, update, or delete operations to be executed against them. Listing 2 shows an example of creating a REFRESH DEFERRED user-maintained MQT. The table, which is named ONTARIO_1995_SALES_TEAM, is based on the underlying tables EMPLOYEE and SALES in the
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/ Page 2 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

named ONTARIO_1995_SALES_TEAM, is based on the underlying tables EMPLOYEE and SALES in the SAMPLE database. Again, the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. After being created, the MQT is in check pending state (see Demystifying table and table space states), and cannot be queried until the SET INTEGRITY statement has been executed against it. The MATERIALIZED QUERY IMMEDIATE UNCHECKED clause specifies that the table is to have integrity checking turned on, but is to be taken out of check pending state without being checked for integrity violations. Next, to populate the MQT with some data, we will import data that had been exported from the EMPLOYEE and SALES tables. The exporting query matches the defining query for the MQT. Then we will insert another record into the ONTARIO_1995_SALES_TEAM table. A query executed against the ONTARIO_1995_SALES_TEAM materialized query table shows that it is now fully populated with the imported and inserted data, demonstrating that user-maintained MQTs can indeed be modified directly. Listing 2. Creating an MQT that is to be maintained by the user
connect to sample ... create table ontario_1995_sales_team as (select distinct e.empno, e.firstnme, e.lastname, e.workdept, e.phoneno, 'Ontario' as region, year(s.sales_date) as year from employee e, sales s where e.lastname = s.sales_person and year(s.sales_date) = 1995 and left(s.region, 3) = 'Ont') data initially deferred refresh deferred maintained by user set integrity for ontario_1995_sales_team materialized query immediate unchecked export to ontario_1995_sales_team.del of del select distinct e.empno, e.firstnme, e.lastname, e.workdept, e.phoneno, 'Ontario' as region, year(s.sales_date) as year from employee e, sales s where e.lastname = s.sales_person and year(s.sales_date) = 1995 and left(s.region, 3) = 'Ont' ... Number of rows exported: 2 import from ontario_1995_sales_team.del of del insert into ontario_1995_sales_team ... Number of rows committed = 2 insert into ontario_1995_sales_team values ('006900', 'RUSS', 'DYERS', 'D44', '1234', 'Ontario', 1995) select * from ontario_1995_sales_team EMPNO -----000110 000330 006900 FIRSTNME -----------VINCENZO WING RUSS LASTNAME --------------LUCCHESSI LEE DYERS WORKDEPT -------A00 E21 D44 PHONENO ------3490 2103 1234 REGION YEAR ------- ----------Ontario 1995 Ontario 1995 Ontario 1995

3 record(s) selected. connect reset
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/ Page 3 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

connect reset

Summary tables
You will recall that a summary table is a specialized type of MQT whose fullselect contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect. Listing 3 shows a simple example of creating a summary table. The table, which is named SALES_SUMMARY, is based on the underlying table SALES in the SAMPLE database. Once again, the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. The REFRESH DEFERRED clause means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. A query against this MQT right after it was created, but before the REFRESH TABLE statement was issued, returns an error. After the REFRESH TABLE statement executes, the query runs successfully. A subsequent insert operation into the SALES table, followed by a summary table refresh and a query against the summary table, shows that the change to the underlying table is reflected in the summary table: salesperson Lee's total sales in the Ontario-South region have increased by 100. Similar behavior can be observed in response to update or delete operations against the underlying SALES table. Listing 3. Creating a summary table
connect to sample ... create table sales_summary as (select sales_person, region, sum(sales) as total_sales from sales group by sales_person, region) data initially deferred refresh deferred select * from sales_summary SALES_PERSON REGION TOTAL_SALES --------------- --------------- ----------SQL0668N Operation not allowed for reason code "1" on table "MELNYK.SALES_SUMMARY". SQLSTATE=57016 refresh table sales_summary select * from sales_summary SALES_PERSON --------------GOUNOT GOUNOT GOUNOT GOUNOT LEE LEE LEE LEE LUCCHESSI LUCCHESSI LUCCHESSI REGION TOTAL_SALES --------------- ----------Manitoba 15 Ontario-North 1 Ontario-South 10 Quebec 24 Manitoba 23 Ontario-North 8 Ontario-South 34 Quebec 26 Manitoba 3 Ontario-South 8 Quebec 3
Page 4 of 9

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

11 record(s) selected.

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

11 record(s) selected. insert into sales values ('06/28/2005', 'LEE', 'Ontario-South', 100) refresh table sales_summary select * from sales_summary SALES_PERSON --------------... LEE LEE LEE ... REGION TOTAL_SALES --------------- ----------Ontario-North Ontario-South Quebec 8 134 26

11 record(s) selected. update sales set sales = 50 where sales_date = '06/28/2005' and sales_person = 'LEE' and region = 'Ontario-South' refresh table sales_summary select * from sales_summary SALES_PERSON --------------... LEE LEE LEE ... REGION TOTAL_SALES --------------- ----------Ontario-North Ontario-South Quebec 8 84 26

11 record(s) selected. delete from sales where sales_date = '06/28/2005' and sales_person = 'LEE' and region = 'Ontario-South' refresh table sales_summary select * from sales_summary SALES_PERSON --------------... LEE LEE LEE ... REGION TOTAL_SALES --------------- ----------Ontario-North Ontario-South Quebec 8 34 26

11 record(s) selected. connect reset

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Page 5 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

Staging tables
You can incrementally refresh a REFRESH DEFERRED MQT if it has a staging table associated with it. The staging table collects changes that need to be applied to synchronize the MQT with its underlying tables. You can create a staging table using the CREATE TABLE statement; then, when the underlying tables of the MQT are modified, the changes are propagated and immediately appended to the staging table. The idea is to use the staging table to incrementally refresh the MQT, rather than regenerate the MQT from scratch. Incremental maintenance provides significant performance improvement. The staging table is pruned when the refresh operation is complete. After it is created, a staging table is in a pending (inconsistent) state; it must be brought out of this state before it can start collecting changes to its underlying tables. You can accomplish this by using the SET INTEGRITY statement. Listing 4 shows an example of using a staging table with a summary table. The summary table, which is named EMP_SUMMARY, is based on the underlying table EMPLOYEE in the SAMPLE database. You'll recall that the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. The REFRESH DEFERRED clause means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. The staging table, which is named EMP_SUMMARY_S, is associated with the summary table EMP_SUMMARY. The PROPAGATE IMMEDIATE clause specifies that any changes made to the underlying table as part of an insert, update, or delete operation are cascaded to the staging table. SET INTEGRITY statements are issued against both tables to take them out of their pending states. Not unexpectedly, a query against the summary table at this point returns no data. The REFRESH TABLE statement returns a warning, a reminder that the "integrity of non-incremental data remains unverified." This, too, is not unexpected. Another query against the summary table returns no data as well. However, after we insert a new row of data into the underlying EMPLOYEE table, a query against the staging table EMP_SUMMARY_S returns one row, corresponding to the data that was just inserted. The staging table has the same three columns that its underlying summary table has, plus two additional columns that are used by the system: GLOBALTRANSID (the global transaction ID for each propagated row) and GLOBALTRANSTIME (the timestamp of the transaction). Another query against the summary table returns no data, but after the REFRESH TABLE statement executes this time, the query runs successfully. Listing 4. Using a staging table with a summary table
connect to sample ... create table emp_summary as (select workdept, job, count(*) as count from employee group by workdept, job) data initially deferred refresh deferred create table emp_summary_s for emp_summary propagate immediate set integrity for emp_summary materialized query immediate unchecked set integrity for emp_summary_s staging immediate unchecked select * from emp_summary WORKDEPT JOB COUNT -------- -------- ----------0 record(s) selected.

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Page 6 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

0 record(s) selected. refresh table emp_summary SQL1594W Integrity of non-incremental data remains unverified by the database manager. SQLSTATE=01636 select * from emp_summary WORKDEPT JOB COUNT -------- -------- ----------0 record(s) selected. insert into employee values ('006900', 'RUSS', 'L', 'DYERS', 'D44', '1234', '1960-05-05', 'FIELDREP', 5, 'M', '1940-04-02', 10000, 100, 1000) select * from emp_summary_s WORKDEPT JOB COUNT GLOBALTRANSID GLOBALTRANSTIME -------- -------- ----------- -------------------... -----------------------------... D44 FIELDREP 1 x'00000000000000CD' x'20050822201344536158000000' 1 record(s) selected. select * from emp_summary WORKDEPT JOB COUNT -------- -------- ----------0 record(s) selected. refresh table emp_summary SQL1594W Integrity of non-incremental data remains unverified by the database manager. SQLSTATE=01636 select * from emp_summary WORKDEPT JOB COUNT -------- -------- ----------D44 FIELDREP 1 1 record(s) selected. connect reset

Summary
The SYSCAT.TABDEP system catalog view contains a row for every dependency that a materialized query table has on some other object. You can query this view to obtain a dependency summary for the MQTs that we have created (Listing 5). MQTs have a DTYPE value of 'S.' The TABNAME column lists the names of the MQTs, and the BNAME column lists the names of the database objects on which the corresponding MQTs depend. The BTYPE column identifies the object type: 'T' for table, 'I' for index, and 'F' for function instance.
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/ Page 7 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

Listing 5. Querying the SYSCAT.TABDEP system catalog view to see MQT dependencies on other database objects
connect to sample ... select substr(tabname,1,24) as tabname, dtype, substr(bname,1,24) as bname, btype from syscat.tabdep where tabschema = 'MELNYK' and dtype = 'S' TABNAME -----------------------EMP EMP EMP EMP EMP_SUMMARY ONTARIO_1995_SALES_TEAM ONTARIO_1995_SALES_TEAM ONTARIO_1995_SALES_TEAM SALES_SUMMARY 9 record(s) selected. connect reset DTYPE ----S S S S S S S S S BNAME -----------------------DEPARTMENT EMPLOYEE SQL050829104058970 SQL050829104058800 EMPLOYEE LEFT1 SALES EMPLOYEE SALES BTYPE ----T T I I T F T T T

We have seen that a materialized query table, whose definition is based upon the result of a query, can be thought of as a kind of materialized view. MQTs are important because they can significantly decrease the response time for complex queries. This article has introduced you to the basic concepts around maintained by system MQTs and maintained by user MQTs, as well as summary tables and staging tables, and these concepts were illustrated by working examples that you can run yourself. To learn more about materialized query tables, or for more detailed information about any of the topics covered in this article, see the DB2 Information Center.

Resources
DB2 Universal Database for Linux, UNIX and Windows Support is the ideal place to locate resources such as the Version 8.2 Information Center and PDF product manuals. For the latest DB2 information online, including more detailed information about materialized query tables, visit the DB2 Information Center. Learn about Demystifying table and table space states in DB2 UDB. Refer to the IBM DB2 Universal Database SQL Reference, Volume 1 and IBM DB2 Universal Database SQL Reference, Volume 2 for detailed SQL documentation.

About the author
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/ Page 8 of 9

DB2 Basics: An introduction to materialized query tables

01/02/2006 07:09 PM

Roman B. Melnyk , Ph.D., is a senior member of the DB2 Information Development team, specializing in database administration, DB2 utilities, and SQL. During more than nine years at IBM, Roman has written numerous DB2 books, articles, and other related materials. Roman coauthored DB2 Version 8: The Official Guide (Prentice Hall Professional Technical Reference, 2003), DB2: The Complete Reference (Osborne/McGraw-Hill, 2001), DB2 Fundamentals Certification for Dummies (Hungry Minds, 2001), and DB2 for Dummies (IDG Books, 2000).

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Page 9 of 9