Manage dimension tables in InfoSphere Information Server DataStage

How to use the Slowly Changing Dimension stage
Skill Level: Intermediate Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM

12 Mar 2009 Information Server DataStage® Version 8.0 introduced the Slowly Changing Dimension (SCD) stage. This tutorial provides step-by-step instructions on how to use the SCD stage for processing dimension table changes. It also shows you how to use the output of the stage to update an associated fact table. The tutorial includes a fully operational download.

Section 1. Before you start
The Slowly Changing Dimension stage was added in the 8.0 release of InfoSphere Information Server DataStage. It is designed specifically to support the types of activities required to populate and maintain records in star schema data models, specifically dimension table data. The Slowly Changing Dimension stage encapsulates all of the dimension maintenance logic — finding existing records, generating surrogate keys, checking for changes, and what action to take when changes occur. In addition, you can associate dimension record surrogate key values with source records, which eliminates the need for additional lookups in later processing.

About this tutorial

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 1 of 32

developerWorks®

ibm.com/developerWorks

This tutorial is designed to introduce you to using the Slowly Changing Dimension stage on the Information Server DataStage parallel canvas. The tutorial uses a simplified example scenario that focuses on Slowly Changing Dimension functionality. Actual business scenarios may require different approaches to the job design used in this tutorial's example. The volume of data processed in the tutorial is intentionally small to make it easier to understand the processing that is taking place. The material in the SCD_Tutorial.zip file in the Download section is built to run on a Windows platform with a DB2 database. You can modify the material to run on a different platform or to use a different database.

Objectives
In this tutorial, you will learn how to design a job that uses the Slowly Changing Dimension stage to perform updating and loading of dimension and fact tables. After completion, you will be able to configure the SCD stage for history-tracking changes and in-place changes, and use the output of the stage to update an associated fact table.

Prerequisites
This tutorial is written for DataStage developers who are familiar with the DataStage Parallel Edition design canvas. You will also benefit if you already have a knowledge of star schema design concepts (including fact and dimension tables), the use of surrogate keys, and the usual methodology for updating dimension tables.

System requirements
To create the job in this tutorial, you need an Information Server DataStage 8.x installation that is licensed to use the parallel engine. You also need a DataStage Designer client and access to a DataStage project where you can create, import, compile, and run DataStage jobs. To use the sample scripts in the SCD_Tutorial.zip download, your Information Server must be installed on a Windows® OS with access to a DB2 database. However, you can also modify the scripts to work on other operating systems and with a different database.

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 2 of 32

ibm.com/developerWorks

developerWorks®

Section 2. Star schemas and Slowly Changing Dimensions
Star schemas are a method of data modeling in which the data that is being measured, called the facts, are stored in one table, called the Fact table. Business Objects are the entities that are involved in the events being measured. Business Objects consist of identifying information and attributes that describe the object. These objects are stored in tables called dimension tables. The facts in the fact table are linked to the business objects in the associated dimension tables using foreign keys. Figure 1. Example Star Schema

Because fact tables record the measurements generated from business events, they tend to grow rapidly. Dimension tables, on the other hand, tend to grow or change less frequently. In the example used in this tutorial, the fact table records information about sales transactions. Every transaction results in a new row in the fact table. The product dimension in the example only grows when a new product is introduced, or if information about an existing product is changed. You typically handle changes to attribute information in one of two ways:

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 3 of 32

and never-changing primary key.com/developerWorks • Overwrite — The existing row in the dimension table is updated to contain the new attribute values. it is not possible to use the business key as the primary key. Surrogate keys provide a way for the dimension table to have a reliable. For example. This is commonly referred to as a Type2 change. Scenario schemas Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. it has been expired). Surrogate Keys Surrogate Keys are values that are generated specifically for the purpose of uniquely identifying dimension table rows. Section 3. and a new row is inserted with the current attribute values. • Tracking History — The existing row in the dimension table is modified to indicate that it is no longer current (that is. but situations can arise where they do change. The source file contains sales transaction records. This is commonly referred to as a Type1 change. • Typical fields that are used as business keys generally don't change. All rights reserved. there will be multiple rows in the dimension table for the same business key. unique.developerWorks® ibm. the old values are no longer available. Tutorial scenario The scenario used for this tutorial has one fact table and two dimension tables that will be updated. or account numbers may be reassigned after a merger. US citizens can be assigned a new social security number. Therefore. The information in the source file is used to update the fact and dimension tables. Trademarks Page 4 of 32 . The primary reasons you would use a surrogate key rather than the usual business key of the object in the dimension table are: • When tracking history in the dimension table. Figure 2.

Table 1 shows the contents of the file.38 7 Duckie McStuff Madison 4444444444 AAAAA fork Stuff Monroe 5555555555 Best lawn 00308. Table 1. All rights reserved.zip download. apply changes to the fact and dimension tables.ibm. It contains five records that.56 14 MoreStuff Adams 2222222222 SqueakyBlue Chair Stuffy's Jefferson3333333333 Sunshine Yellow 00203. Source data StoreId StoreName StoreMgr ProdSKU ProdBrand ProdDescr SaleAmtSaleUnits A1111 A1112 A1113 A1114 A1115 Stuff Washington 1111111111 Bob's Red box 00436.dat and is contained in the SCD_Tutorial.87 2 00024. when processed.40 11 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Trademarks Page 5 of 32 .com/developerWorks developerWorks® Source data The source data file is named SaleDetail.14 13 00456.

When the source data is processed.zip download contains a script that creates the table as shown in Table 4. and to track the history of changed product information.zip download contains a script that creates and populates this table with the data shown in Table 3.bat file in the SCD_Tutorial. All rights reserved. Initially this table contains no records.developerWorks® ibm. Initial Fact table data ProdSK StoreSK SaleAmt SaleUnits Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. When the source data is processed. Table 3. Initially this table contains records for three products. mower Product dimension The product dimension is a table in the target database.bat file in the SCD_Tutorial.zip download contains a script that creates and populates this table with the data shown in Table 2. the table is updated to contain new store records. the table is updated to contain new product records. Table 4. the table is updated with the sales facts and references to the corresponding dimension records. The Setup. and to overwrite changed store information.com/developerWorks Jr. Initially this table contains records for three stores. Initial store dimension data StoreSK 1 2 5 ID A1113 A1114 A1115 Name Stuffy's McStuff Lil Stuff Mgr Jefferson Adams Monroe Fact table The fact dimension is a table in the target database. Initial product dimension data ProdSK SKU 1 2 10 Brand Descr Curr Y Y Y EffDate ExpDate 3333333333 Sunshine Yellow Duckie 4444444444 AAAAA 5555555555 AAAAA spoon grass cutter 2004-01-01 2099-12-31 2004-01-01 2099-12-31 2004-01-01 2099-12-31 Store dimension The store dimension is a table in the target database. When the source data is processed. The Setup. Table 2. Trademarks Page 6 of 32 .bat file in the SCD_Tutorial. The Setup.

ibm. Typically this is C:\IBM\SQLLIB\DB2.cfg file exists. Resetting the tutorial Once the tutorial has been run the first time. run the zReset executable shortcut in the C:\IBM\Demo\DataStage\SCD directory. This configures the project to access DB2 as a source or target for the DB2 Enterprise Stage. You should end up with the directory C:\IBM\Demo\DataStage\SCD. Be sure to select the option in your extraction program that indicates you want to use the folder or directory names when extracting. Setting up the tutorial To set up the tutorial. This displays the contents of the product and store dimensions as well as the fact table.zip into C:\IBM\Demo\DataStage. Trademarks Page 7 of 32 . save the SCD_Tutorial. 4. If not. Run C:\IBM\Demo\DataStage\SCD\setup. set the environment variable APT_DB2INSTANCE_HOME to the location where the db2nodes. All rights reserved. Therefore. In the DataStage Administrator client.zip file from the Download section to your local file system and follow these steps: 1. Review the output to verify that the tables have been initialized properly.com/developerWorks developerWorks® Section 4. import C:\IBM\Demo\DataStage\SCD\SCD_Tutorial. subsequent runs would see different behavior. Initializing the surrogate keys Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Extract the contents of SCD_Tutorial. Using the DataStage Designer client.dsx into your DataStage project. the contents of the database will have changed. If you want to reset the database tables back to their initial state. create it. Check if you already have the following directory structure: C:\IBM\Demo\DataStage. 2. 3. 5. Verify the state of the database Run the Results executable shortcut in the C:\IBM\Demo\DataStage\SCD directory.bat. which contains several files and an empty sub-directory named SKG.

All rights reserved. Compile and run the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Generation\CreateAndUpdate_File job to initialize the state files. This ensures that unique values are always generated. updates the product and store dimensions. Figure 3. Trademarks Page 8 of 32 .developerWorks® ibm. Because the dimension tables are created with data in them. then creates and updates the respective surrogate key generator state files. Draw the job design as illustrated below in Figure 3. and inserts records into the fact table. The job reads the product dimension table and the store dimension table. The source records are read from SaleDetail. a completed version of the job named Demo\DataStage\Slowly Changing Dimensions\SCD_All is included in the download. passed to the first SCD stage to process the Product Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.com/developerWorks The tutorial uses surrogate key generators that use state files to record the key values that have been used. you need to make the surrogate key generators aware of what values have already been used. For reference. Building the Slowly Changing Dimensions job In this step you build a job that reads the SalesDetail.dat source file. Job design The primary flow of records is from left to right in the job design.

the product and store dimension tables are reference sources to the SCD stages. so no records are written on the dimension update link. One record is written on the dimension update link to reflect these types of changes. No records are added or removed on this flow of data. one. they are written to the secondary output link of the SCD stage. which is called the dimension update link. • Zero records Unchanged records require no action to the dimension table. Trademarks Page 9 of 32 . These tables are used to initialize the lookup cache. As part of the processing in the SCD stages. then passed to the next SCD stage to process the store dimension. if any. and a new record must be inserted for the new set of values. • One record New records and overwriting updates (Type1) require a one row change to the dimension table. Two records are written to the dimension update link to reflect these changes. Looking at the job design from top to bottom. • Two records Changed records that are tracking history (Type2) require a two row change to the dimension table.ibm. you are ready to perform the next Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Configuring the stages Now that you have built the high level job design. Only records that are considered current are stored in the lookup cache. Any historical records in the dimension tables are automatically filtered out during initial processing. Target database stages are connected to the dimension update link to apply the changes to the actual dimension table in the database.com/developerWorks developerWorks® dimension. or two records on the dimension update link. the surrogate key values that are associated with the source records are obtained from the dimension table and added to the data being passed to the fact table. The existing record must be updated to reflect that it is no longer current. If any changes are required to the dimension table. The change is either an insert or an update. Each record on the primary input link of the SCD stage will go out on the primary output link. and may produce zero. action needs to be taken on the dimension table. The number of records produced depends on what. Every record read from the source is inserted into the fact table. All rights reserved. and finally to the fact table. The SCD stage uses the data values from the primary input link to lookup into the cache and check for changes.

dat file. add the Record delimiter string property and set it to DOS Format. On the Output|Format tab. On the Output|Format tab. Configure the primary source stage The source stage must be configured to read the SaleDetail. Complete the following steps to configure the SaleDetail sequential file stage: 1. All rights reserved.dat.developerWorks® ibm. remove the Final delimiter property. Trademarks Page 10 of 32 . 4. 2. • Process the dimension tables. Load the Demo\DataStage\Slowly Changing Dimensions\TableDefs\SaleDetail table definition onto the output link.com/developerWorks set of steps in which you: • Configure the individual stages to access the source data. • Update the fact table. 3. On the Output|Properties tab. Source stage Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. set the File property to C:\IBM\Demo\DataStage\SCD\SaleDetail. Figure 4.

Use View Data to confirm that the data is being read from the database properly. • The dimension update link is connected to the dimension update target stage.ibm. Trademarks Page 11 of 32 .com/developerWorks developerWorks® The source stage should now be configured to read the SaleDetail.dat file. which specifies how to update the actual database table with the data produced by the SCD stage. All rights reserved. • The SCD stage determines what changes need to be made to the dimension table and those changes are written to the dimension update link. Configure the stages to process the Product dimension Three stages are used to process the Product dimension. Reading the job design from top to bottom: • The first stage specifies how to read the data from the dimension table. Configure the Product dimension source stage Complete the following steps to configure the Product dimension DB2 Enterprise stage: Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.

ProdDim. 2. set the Database property to SCDDemo. On the Output|Properties tab. Open the product dimension SCD stage editor and use the Fast Path control to set the properties as shown: Fast Path control Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. 3. On the Output|Properties tab. set the Read method property to Table. Configure the Product dimension SCD stage The Fast Path control of the SCD stage editor lets you navigate directly to the tabs that require input in order to complete the stage configuration. Product dimension source The stage should now be configured to read the SCD. On the Output|Properties tab. Figure 5.ProdDim table definition onto the output link. set the Use Default Database and Use Default Server properties to False. Load the Demo\DataStage\Slowly Changing Dimensions\TableDefs\SCD. The control is in the lower left corner of the editor. On the Output|Properties tab. All rights reserved. On the Output|Properties tab.developerWorks® ibm.ProdDim table. Use View Data to confirm that the data is being read from the database properly. set the Server property to DB2. 6. Use the arrow buttons to move forward or backward through the tabs. 4. 5. Trademarks Page 12 of 32 .com/developerWorks 1. set the Table property to SCD.

the first output link connected to the stage is used as the primary output link. All rights reserved. Use the drop down list to select the output link that is leading to the next SCD stage.com/developerWorks developerWorks® The SCD stage has two input links and two output links. Figure 6. Product dimension SCD stage. Fast Path page 1 • Fast Path page 2: Define the lookup condition and purpose codes The first task on this page is to define what the various columns of the dimension table are used for. This is the primary output of the stage. Look at the link name that is displayed in the Select output link property. Use the Fast Path control to move directly to the tabs that are required to configure the stage. This results in a high number of property link-tab combinations. This information is used in a number of ways in the SCD processing.ibm. The choices for purpose codes are: Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Trademarks Page 13 of 32 . The other link automatically becomes the dimension update link. • Fast Path page 1: Setting the output link By default.

If the value has changed. • SK Chain — This column is used to store the surrogate key of the previous or next record in the history for a particular business key. Trademarks Page 14 of 32 . • Effective Date — This column is used to specify when a record first became the most current record. • Type 1 — Check this column for a change in value. Data for this field is inserted into the table when a new row is inserted. but this column will not be checked for changes against the source data. This column is typically used as a lookup column and corresponds to a key or some other field of the source data that identifies the associated business object. perform an overwriting change to the dimension table. • Type 2 — Check this column for a change in value. Product dimension SCD stage. it contains columns to track whether a row is current and the date range for when it was current. Click on the ProdSKU source field and drag it to the SKU dimension column to create the lookup condition. Set purpose codes for the columns as shown below in Figure 7. The lookup is used to find the dimension table row that corresponds to a source data row. • Current Indicator — This column is used as a flag to indicate whether it is the most current record for a particular business key. Because this dimension table is tracking history. All rights reserved. • Expiration Date — This column is used to specify the ending date of when a record was the active record. If the value has changed. that is. but is not the primary key of the dimension table. For currently active records. • (blank) — This column is not used for anything with respect to SCD processing. Fast Path page 2 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. perform a history tracking change to the dimension table.com/developerWorks • Surrogate Key — This column is the primary key of the dimension table and is populated with a surrogate key value.developerWorks® ibm. • Business Key — This column is the identifier of the business objects that the dimension table is representing. Figure 7. when it became the active record. this value is typically a future date or NULL.

• Fast Path page 3: Configuring the surrogate key generator Surrogate key generation capabilities are integrated into the SCD stage. or use DB2 or Oracle database sequence object based generation. In this manner. This tutorial uses the file based method.ibm. If more than one source column is associated with a dimension column. then those equality conditions are AND'ed together. Any source column can be associated with any one dimension column.com/developerWorks developerWorks® Although this tab looks similar to a mapping tab. Trademarks Page 15 of 32 . All rights reserved. Set the Source name property to C:\IBM\Demo\DataStage\SCD\SKG\ProdDim as shown in Figure 8. This tab specifies how surrogate keys are generated for this stage. multi-column lookup keys can be used. Surrogate key generation can use DataStage's file based surrogate generation. This is the surrogate key state file you created by running the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. it is actually defining the lookup keys from the source record to the dimension record. This creates an equality lookup condition between those columns.

the derivation expressions are applied and a record is written on the dimension update link to indicate a new record needs to be added to the dimension table. and are only available for Current Indicator and Expiration Date columns. Leave the defaults for the other properties unchanged. the derivation expressions are applied to the source columns. The Expire column is used to specify what values need to change if an existing record needs to be expired.developerWorks® ibm. and then the results are compared to Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.com/developerWorks Generation\CreateAndUpdate_File job. Fast Path page 3 • Fast Path page 4: Defining the slowly changing dimension behavior and derivations The DimUpdate tab is used to define several critical elements of SCD processing. All rights reserved. Expire expressions are only enabled when there are Type2 columns specified. Trademarks Page 16 of 32 . Figure 8. If a matching record is found. Product dimension SCD stage. If no matching record is found when the lookup is performed. The Derivation column is used to specify how to map elements of a source row to elements of the dimension table.

one record is written on the dimension update link that indicates an update to the dimension table. Figure 9. Set the Derivation expressions and the Expire expressions as shown below in Figure 9. The derivation expressions are used to calculate the values for the update record. Trademarks Page 17 of 32 . Columns specified as Type2 are compared first.ibm. The SCD stage only does this when the set of columns on the dimension update link is empty. to expire the matched row. The first record is an update record. they must exactly match those specified on the dimension input link. If there is a change. however. This tab operates much like the Mapping tab of Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. The output columns for this link were automatically propagated with their purpose codes from the dimension input link. Product dimension SCD stage. • Fast Path page 5: Selecting the columns for Output Link The Output Map tab is used to define what columns will leave this stage on the primary output link. the Type1 columns are compared. Fast Path page 4 Note that you are specifying these properties on the dimension update link. If no Type2 columns have changed. The second record is a new record that contains all of the new values for all columns. If there are any changes. It is possible to load a set of columns directly on the dimension update link. two records are written on the dimension update link. All rights reserved.com/developerWorks developerWorks® the corresponding columns of the dimension table. The Expire expressions are used to calculate the values for the update row.

developerWorks® ibm. The output link is initially empty. Product dimension SCD stage.com/developerWorks other stages. Select the columns for output as shown below in Figure 10. The columns coming from the reference link represent the values from the dimension table that correspond to the source row. Note that because the SCD processing has been done by the stage. All rights reserved. Figure 10. Instead. Create and map the output columns by dragging and dropping from the source to the target. the source columns that contain those attributes are no longer needed. The columns coming from the primary source have the same values they entered the stage with. Fast Path page 5 The stage is now configured to perform the dimension maintenance on the Product Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. every record from the primary source data will have a corresponding record in the dimension. Because the product dimension has now been processed. The only difference is that you can select columns from the primary input link and columns from the reference link to output. the primary key associated with the source row is appended because that is the value that is required to be inserted into the fact table. Trademarks Page 18 of 32 .

On the Input|Properties tab. Auto-generated update and insert statements take the purpose codes specified in the SCD stage into account to generate the correct update statement for this usage. 6. a Upsert write method must be used. 2. 3. Trademarks Page 19 of 32 . 5. On the Input|Properties tab. Configure the Product dimension target stage This stage processes the dimension update link records produced by the product dimension SCD stage to update the actual dimension table in the database. Product dimension target Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. set the Upsert Mode property to Auto-generated Update and Insert. set the Database property to SCDDemo. set the Write Method property to Upsert. On the Input|Properties tab.com/developerWorks developerWorks® dimension table.ibm. set the Use Default Database and Use Default Server to False. On the Input|Properties tab. 4. On the Input|Properties tab. On the Input|Properties tab. Because incoming records represent both inserts and updates to the table. set the Server property to DB2. All rights reserved. Complete the following steps to configure the Product dimension update DB2 Enterprise stage: 1. set the Table property to SCD. Figure 11.ProdDim.

StoreDim. set the Table property to SCD. set the Database property to SCDDemo. 3.ProdDim dimension table. set the Server property to DB2. set the Use Default Database and Use Default Server to False. On the Output|Properties tab. On the Output|Properties tab.developerWorks® ibm. 2. 5. On the Output|Properties tab.com/developerWorks The stage is now configured to write to the SCD. Configure the stages to process the Store dimension Configure the Store dimension source stage Complete the following steps to configure the Store dimension DB2 Enterprise stage: 1. On the Output|Properties tab. Load the Demo\DataStage\Slowly Changing Trademarks Page 20 of 32 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. 6. set the Read Method property to Table. On the Output|Properties tab. . 4. All rights reserved.

Figure 12. The other link automatically becomes the dimension update link.com/developerWorks developerWorks® Dimensions\TableDefs\SCD. Trademarks Page 21 of 32 .StoreDim table definition onto the output link. This is the primary output of the stage. Configure the Store dimension SCD stage Open the store dimension SCD stage editor and use the Fast Path control to set the properties as shown: • Fast Path page 1: Setting the Output Link Use the Select output link drop down list to select the link leading to the fact table. Figure 13.StoreDim table. Store dimension SCD stage. Store dimension source stage The stage should now be configured to read the SCD. Fast Path page 1 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.ibm. All rights reserved. Use View Data to confirm that the data is being read from the database properly.

Figure 14. Because this dimension table is not tracking history. All rights reserved. Trademarks Page 22 of 32 . Click on the StoreId source field and drag it to the dimension column Id to create the lookup condition.developerWorks® ibm. Store dimension SCD stage.com/developerWorks • Fast Path page 2: Define the lookup condition and purpose codes Set purpose codes for the columns as shown below in Figure 14. Fast Path page 2 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. The Name column has a blank purpose code. which indicates that this column will not be checked for changes. it does not contain columns to track whether a row is current or not.

the SCD stage does not check this Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.ibm. Trademarks Page 23 of 32 . Store dimension SCD stage. Fast Path page 3 • Fast Path page 4: Defining the slowly changing dimension behavior and derivations Set the Derivation expressions as shown below in Figure 16. This is the surrogate key state file you created by running the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Generation\CreateAndUpdate_File job. Figure 15. All rights reserved.com/developerWorks developerWorks® • Fast Path page 3: Configuring the surrogate key generator Set the file path property to C:\IBM\Demo\DataStage\SCD\SKG\StoreDim as shown in Figure 15. Because the Name column has no purpose code. Leave the defaults for the other properties.

Because there are no Type2 columns in this dimension table. Because the store dimension has now been processed. All rights reserved. Trademarks Page 24 of 32 . the source columns that contain those attributes are no longer needed. Figure 17.com/developerWorks column for changes when a matching dimension record is found on the lookup. the Expire expression is not enabled for any column. Fast Path page 5 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Fast Path page 4 • Fast Path page 5: Selecting the columns for Output Link Select the columns for output as shown below in Figure 17. Store dimension SCD stage. the surrogate key associated with the source row is appended because that is the value that is required to be inserted into the fact table.developerWorks® ibm. Store dimension SCD stage. Figure 16. Instead.

All rights reserved. set the Database property to SCDDemo. Trademarks Page 25 of 32 . set the Upsert Mode property to Auto-generated Update and Insert.ibm. On the Input|Properties tab. On the Input|Properties tab.StoreDim. 5. set the Table property to SCD. 2. 4. 3. On the Input|Properties tab. On the Input|Properties tab. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. set the Write method property to Upsert. set the Use Default Database and Use Default Server to False.com/developerWorks developerWorks® The stage is now configured to perform the dimension maintenance on the store dimension table. On the Input|Properties tab. Configure the Store dimension target stage This stage processes the dimension update records produced by the store dimension SCD stage to update the actual dimension table in the database. Complete the following steps to configure the Store dimension target DB2 Enterprise stage: 1.

All rights reserved. Trademarks Page 26 of 32 . Complete the following steps to configure the Fact table target DB2 Enterprise stage: 1. 3. 2. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. On the Input|Properties tab. On the Input|Properties tab. set the Server property to DB2. set the Write Method property to Write. On the Input|Properties tab.developerWorks® ibm. the original input source records have been processed so that the only columns on this link are the measurements (SaleAmt and SaleUnits) and the surrogate key values for the associated Product and Store. Figure 18.StoreDim dimension table. Configure the Fact table target stage This stage processes the source records that have been passed through the primary output links to update the actual fact table in the database.com/developerWorks 6. set the Write Mode property to Append. set the Table property to SCD. On the Input|Properties tab. At this point. Store dimension target stage The stage is now configured to write to the SCD.Facttbl.

Note that the SCD stage processing makes use of the transform operator. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. check your job and stages against the settings specified in the tutorial and make any necessary changes. set the Database property to SCDDemo.Facttbl dimension table. 5. The Resources page contains a link to an article in the information center for IBM Information Server with details on configuring your environment correctly for your C++ compiler. On the Input|Properties tab. Click the Compile button to start the compile. Trademarks Page 27 of 32 . See the Information Server Configuration Guide for details on how to configure the environment correctly for your C++ compiler. the C++ compiler settings for the project must be correct. Figure 19.ibm. Fact table target stage The stage is now configured to write to the SCD. set the Use Default Database and Use Default Server to False. All rights reserved. On the Input|Properties tab.com/developerWorks developerWorks® 4. Final steps You have now completed the job design and are ready to compile. 6. On the Input|Properties tab. set the Server property to DB2. If any compile errors occur. So for the job to compile successfully.

Change ProdSK SKU No 1 Change Expired 2 (Type2) Expired 10 (Type2) New 3 Record New 4 Record New 5 Record (Type2) Brand Descr Curr EffDate ExpDate 2004-01-01 2099-12-31 2004-01-01 {Today's Date} 2004-01-01 {Today's Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} 3333333333 Sunshine Yellow Y Duckie 4444444444 AAAAA spoon 5555555555 AAAAA grass cutter 1111111111 Bob's Red Box N N Y Y Y 2222222222 SqueakyBlue Chair 4444444444 AAAAA fork New 6 5555555555 Best Record(Type2) lawn Y mower Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. and four new records. the store dimensions. resulting in the two updates and two of the new records. you are now ready to compile and run the job. and two existing records had Type2 changes.com/developerWorks Section 5. Trademarks Page 28 of 32 . Running the tutorial At this point.developerWorks® ibm. Run the Results executable shortcut in the C:\IBM\Demo\DataStage\SCD directory to see the initial contents of the database tables. All rights reserved. Summary of changes to database tables The contents of the database tables should now appear as follows: • The product dimension has two update records. and the fact table. Two of the new records are new objects to the dimension table. The Results shortcut displays the contents of the product dimension. After the job finishes successfully. run the Results shortcut again to see the changes that were made to the database tables. Run the job by clicking the Run button in the DataStage Designer.

what results would you expect to see? Hint: The dimension tables and the source file are now in-sync. If you were to run the job again. run the zReset executable shortcut . Conclusion You can use the Slowly Changing Dimension stage to greatly reduce the time you spend creating jobs for processing star schemas.87 24. ProdSK 3 4 1 5 6 StoreSK 3 4 1 2 5 SaleAmt 436. You have also seen how you can reduce fact table processing by augmenting the source data with associated dimension table surrogate keys that eliminate the need for an additional lookup. This completes the Slowly Changing Dimensions tutorial.56 203. The surrogate key values in this table correspond to the current records in the dimension tables. Trademarks Page 29 of 32 . The updated record had a Type1 change and the two new records are new objects to the dimension table.14 456. Change No Change Update No Change New Record New Record StoreSK 1 2 5 3 4 ID A1113 A1114 A1115 A1111 A1112 Name Stuffy's McStuff Lil Stuff Stuff MoreStuff Mgr Jefferson Madison Monroe Washington Adams • The fact table has five new records. To reset the database tables to their original state.40 SaleUnits 13 14 7 2 11 The contents of the dimension tables have now changed.com/developerWorks developerWorks® • The store dimension has one updated record. and two new records. All rights reserved. In this tutorial you have learned how to configure the Slowly Changing Dimension stage to process history-tracking changes and in-place changes to dimension tables.ibm. one for each source record processed. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.38 308.

developerWorks® ibm. Trademarks Page 30 of 32 .com/developerWorks Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

zip tutorial Information about download methods Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Trademarks Page 31 of 32 .com/developerWorks developerWorks® Downloads Description Name Size 16KB Download method HTTP Supporting scripts and DS jobs for this SCD_Tutorial. All rights reserved.ibm.

Brian has been working in the DataStage development organization for 10 years and was involved in the design of the Slowly Changing Dimension Stage. Lotus®. get the resources you need to advance your InfoSphere product skills.developerWorks® ibm. Get products and technologies • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®. • C++ compiler for job development topic in the information center for IBM Information Server. Tivoli®. • Check out developerWorks blogs and get involved in the developerWorks community. Discuss • Participate in the discussion forum for this content. About the author Brian Caufield Brian Caufield is a software architect in IBM Silicon Valley Lab. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. and WebSphere®.com/developerWorks Resources Learn • In the InfoSphere area on developerWorks. Trademarks Page 32 of 32 . • Browse the technology bookstore for books on these and other technical topics. All rights reserved. Rational®.

Sign up to vote on this title
UsefulNot useful