Manage dimension tables in InfoSphere Information Server DataStage

How to use the Slowly Changing Dimension stage
Skill Level: Intermediate Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM

12 Mar 2009 Information Server DataStage® Version 8.0 introduced the Slowly Changing Dimension (SCD) stage. This tutorial provides step-by-step instructions on how to use the SCD stage for processing dimension table changes. It also shows you how to use the output of the stage to update an associated fact table. The tutorial includes a fully operational download.

Section 1. Before you start
The Slowly Changing Dimension stage was added in the 8.0 release of InfoSphere Information Server DataStage. It is designed specifically to support the types of activities required to populate and maintain records in star schema data models, specifically dimension table data. The Slowly Changing Dimension stage encapsulates all of the dimension maintenance logic — finding existing records, generating surrogate keys, checking for changes, and what action to take when changes occur. In addition, you can associate dimension record surrogate key values with source records, which eliminates the need for additional lookups in later processing.

About this tutorial

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 1 of 32

developerWorks®

ibm.com/developerWorks

This tutorial is designed to introduce you to using the Slowly Changing Dimension stage on the Information Server DataStage parallel canvas. The tutorial uses a simplified example scenario that focuses on Slowly Changing Dimension functionality. Actual business scenarios may require different approaches to the job design used in this tutorial's example. The volume of data processed in the tutorial is intentionally small to make it easier to understand the processing that is taking place. The material in the SCD_Tutorial.zip file in the Download section is built to run on a Windows platform with a DB2 database. You can modify the material to run on a different platform or to use a different database.

Objectives
In this tutorial, you will learn how to design a job that uses the Slowly Changing Dimension stage to perform updating and loading of dimension and fact tables. After completion, you will be able to configure the SCD stage for history-tracking changes and in-place changes, and use the output of the stage to update an associated fact table.

Prerequisites
This tutorial is written for DataStage developers who are familiar with the DataStage Parallel Edition design canvas. You will also benefit if you already have a knowledge of star schema design concepts (including fact and dimension tables), the use of surrogate keys, and the usual methodology for updating dimension tables.

System requirements
To create the job in this tutorial, you need an Information Server DataStage 8.x installation that is licensed to use the parallel engine. You also need a DataStage Designer client and access to a DataStage project where you can create, import, compile, and run DataStage jobs. To use the sample scripts in the SCD_Tutorial.zip download, your Information Server must be installed on a Windows® OS with access to a DB2 database. However, you can also modify the scripts to work on other operating systems and with a different database.

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 2 of 32

ibm.com/developerWorks

developerWorks®

Section 2. Star schemas and Slowly Changing Dimensions
Star schemas are a method of data modeling in which the data that is being measured, called the facts, are stored in one table, called the Fact table. Business Objects are the entities that are involved in the events being measured. Business Objects consist of identifying information and attributes that describe the object. These objects are stored in tables called dimension tables. The facts in the fact table are linked to the business objects in the associated dimension tables using foreign keys. Figure 1. Example Star Schema

Because fact tables record the measurements generated from business events, they tend to grow rapidly. Dimension tables, on the other hand, tend to grow or change less frequently. In the example used in this tutorial, the fact table records information about sales transactions. Every transaction results in a new row in the fact table. The product dimension in the example only grows when a new product is introduced, or if information about an existing product is changed. You typically handle changes to attribute information in one of two ways:

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved.

Trademarks Page 3 of 32

developerWorks® ibm. Figure 2.com/developerWorks • Overwrite — The existing row in the dimension table is updated to contain the new attribute values. The source file contains sales transaction records. but situations can arise where they do change. it is not possible to use the business key as the primary key. Section 3. This is commonly referred to as a Type2 change. it has been expired). For example. or account numbers may be reassigned after a merger. Surrogate keys provide a way for the dimension table to have a reliable. there will be multiple rows in the dimension table for the same business key. The primary reasons you would use a surrogate key rather than the usual business key of the object in the dimension table are: • When tracking history in the dimension table. US citizens can be assigned a new social security number. • Tracking History — The existing row in the dimension table is modified to indicate that it is no longer current (that is. unique. Scenario schemas Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Therefore. This is commonly referred to as a Type1 change. and never-changing primary key. Trademarks Page 4 of 32 . • Typical fields that are used as business keys generally don't change. The information in the source file is used to update the fact and dimension tables. Surrogate Keys Surrogate Keys are values that are generated specifically for the purpose of uniquely identifying dimension table rows. and a new row is inserted with the current attribute values. All rights reserved. Tutorial scenario The scenario used for this tutorial has one fact table and two dimension tables that will be updated. the old values are no longer available.

All rights reserved.com/developerWorks developerWorks® Source data The source data file is named SaleDetail.dat and is contained in the SCD_Tutorial. apply changes to the fact and dimension tables. Trademarks Page 5 of 32 . Table 1.87 2 00024.38 7 Duckie McStuff Madison 4444444444 AAAAA fork Stuff Monroe 5555555555 Best lawn 00308. Table 1 shows the contents of the file. It contains five records that. when processed.ibm.56 14 MoreStuff Adams 2222222222 SqueakyBlue Chair Stuffy's Jefferson3333333333 Sunshine Yellow 00203.zip download.40 11 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Source data StoreId StoreName StoreMgr rodSKU P ProdBrand ProdDescr SaleAmtSaleUnits A1111 A1112 A1113 A1114 A1115 Stuff Washington 1111111111 Bob's Red box 00436.14 13 00456.

The Setup. The Setup. When the source data is processed.zip download contains a script that creates and populates this table with the data shown in Table 2. the table is updated to contain new product records. the table is updated with the sales facts and references to the corresponding dimension records. Table 4. Initial Fact table data ProdSK StoreSK SaleAmt SaleUnits Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. and to overwrite changed store information.zip download contains a script that creates the table as shown in Table 4. When the source data is processed.zip download contains a script that creates and populates this table with the data shown in Table 3. Table 3. Table 2. the table is updated to contain new store records.bat file in the SCD_Tutorial. Initially this table contains records for three stores.bat file in the SCD_Tutorial. The Setup. All rights reserved. Initially this table contains no records. and to track the history of changed product information. When the source data is processed.developerWorks® ibm. mower Product dimension The product dimension is a table in the target database.com/developerWorks Jr. Initial product dimension data ProdSK SKU 1 2 10 Brand Descr Curr Y Y Y EffDate ExpDate 3333333333 Sunshine Yellow Duckie 4444444444 AAAAA 5555555555 AAAAA spoon grass cutter 2004-01-01099-12-31 2 2004-01-01099-12-31 2 2004-01-01099-12-31 2 Store dimension The store dimension is a table in the target database. Initial store dimension data StoreSK 1 2 5 ID A1113 A1114 A1115 Name Stuffy's McStuff Lil Stuff Mgr Jefferson Adams Monroe Fact table The fact dimension is a table in the target database.bat file in the SCD_Tutorial. Trademarks Page 6 of 32 . Initially this table contains records for three products.

which contains several files and an empty sub-directory named SKG. import C:\IBM\Demo\DataStage\SCD\SCD_Tutorial. Resetting the tutorial Once the tutorial has been run the first time. create it.bat. If you want to reset the database tables back to their initial state. Run C:\IBM\Demo\DataStage\SCD\setup. Review the output to verify that the tables have been initialized properly. Verify the state of the database Run the Results executable shortcut in the C:\IBM\Demo\DataStage\SCD directory. save the SCD_Tutorial.zip into C:\IBM\Demo\DataStage. All rights reserved. subsequent runs would see different behavior. 5. 4. Using the DataStage Designer client.zip file from the Download section to your local file system and follow these steps: 1.ibm. Trademarks Page 7 of 32 .com/developerWorks developerWorks® Section 4. 2. In the DataStage Administrator client. Therefore. Check if you already have the following directory structure: C:\IBM\Demo\DataStage. Setting up the tutorial To set up the tutorial.cfg file exists. Initializing the surrogate keys Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. run the zReset executable shortcut in the C:\IBM\Demo\DataStage\SCD directory. This configures the project to access DB2 as a source or target for the DB2 Enterprise Stage.dsx into your DataStage project. Typically this is C:\IBM\SQLLIB\DB2. This displays the contents of the product and store dimensions as well as the fact table. If not. set the environment variable APT_DB2INSTANCE_HOME to the location where the db2nodes. You should end up with the directory C:\IBM\Demo\DataStage\SCD. Be sure to select the option in your extraction program that indicates you want to use the folder or directory names when extracting. the contents of the database will have changed. Extract the contents of SCD_Tutorial. 3.

For reference. This ensures that unique values are always generated.com/developerWorks The tutorial uses surrogate key generators that use state files to record the key values that have been used. Because the dimension tables are created with data in them. Figure 3. The source records are read from SaleDetail. The job reads the product dimension table and the store dimension table. then creates and updates the respective surrogate key generator state files. passed to the first SCD stage to process the Product Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. a completed version of the job named Demo\DataStage\Slowly Changing Dimensions\SCD_All is included in the download.developerWorks® ibm. Trademarks Page 8 of 32 .dat source file. Draw the job design as illustrated below in Figure 3. All rights reserved. Job design The primary flow of records is from left to right in the job design. updates the product and store dimensions. Building the Slowly Changing Dimensions job In this step you build a job that reads the SalesDetail. and inserts records into the fact table. Compile and run the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Generation\CreateAndUpdate_File job to initialize the state files. you need to make the surrogate key generators aware of what values have already been used.

Configuring the stages Now that you have built the high level job design. Looking at the job design from top to bottom. The number of records produced depends on what. The existing record must be updated to reflect that it is no longer current. As part of the processing in the SCD stages. • Two records Changed records that are tracking history (Type2) require a two row change to the dimension table. No records are added or removed on this flow of data. Each record on the primary input link of the SCD stage will go out on the primary output link. The SCD stage uses the data values from the primary input link to lookup into the cache and check for changes. • One record New records and overwriting updates (Type1) require a one row change to the dimension table. Trademarks Page 9 of 32 . or two records on the dimension update link. the product and store dimension tables are reference sources to the SCD stages. they are written to the secondary output link of the SCD stage. which is called the dimension update link. • Zero records Unchanged records require no action to the dimension table. One record is written on the dimension update link to reflect these types of changes. action needs to be taken on the dimension table. and finally to the fact table.com/developerWorks developerWorks® dimension. Target database stages are connected to the dimension update link to apply the changes to the actual dimension table in the database. and may produce zero. If any changes are required to the dimension table. Every record read from the source is inserted into the fact table. All rights reserved. These tables are used to initialize the lookup cache. The change is either an insert or an update. Only records that are considered current are stored in the lookup cache. and a new record must be inserted for the new set of values. you are ready to perform the next Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.ibm. so no records are written on the dimension update link. then passed to the next SCD stage to process the store dimension. one. Two records are written to the dimension update link to reflect these changes. the surrogate key values that are associated with the source records are obtained from the dimension table and added to the data being passed to the fact table. Any historical records in the dimension tables are automatically filtered out during initial processing. if any.

3. add the Record delimiter string property and set it to DOS Format. On the Output|Properties tab. On the Output|Format tab.dat.developerWorks® ibm.com/developerWorks set of steps in which you: • Configure the individual stages to access the source data. Trademarks Page 10 of 32 . • Process the dimension tables. remove the Final delimiter property. All rights reserved. 4.dat file. Source stage Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. On the Output|Format tab. Load the Demo\DataStage\Slowly Changing Dimensions\TableDefs\SaleDetail table definition onto the output link. Complete the following steps to configure the SaleDetail sequential file stage: 1. • Update the fact table. 2. Configure the primary source stage The source stage must be configured to read the SaleDetail. Figure 4. set the File property to C:\IBM\Demo\DataStage\SCD\SaleDetail.

ibm. Trademarks Page 11 of 32 . Configure the Product dimension source stage Complete the following steps to configure the Product dimension DB2 Enterprise stage: Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Configure the stages to process the Product dimension Three stages are used to process the Product dimension.dat file. Reading the job design from top to bottom: • The first stage specifies how to read the data from the dimension table. • The dimension update link is connected to the dimension update target stage. which specifies how to update the actual database table with the data produced by the SCD stage. • The SCD stage determines what changes need to be made to the dimension table and those changes are written to the dimension update link. Use View Data to confirm that the data is being read from the database properly.com/developerWorks developerWorks® The source stage should now be configured to read the SaleDetail. All rights reserved.

Use View Data to confirm that the data is being read from the database properly. 2. Open the product dimension SCD stage editor and use the Fast Path control to set the properties as shown: Fast Path control Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. set the Use Default Database and Use Default Server properties to False. On the Output|Properties tab. 4.com/developerWorks 1. Trademarks Page 12 of 32 . Configure the Product dimension SCD stage The Fast Path control of the SCD stage editor lets you navigate directly to the tabs that require input in order to complete the stage configuration.developerWorks® ibm. set the Table property to SCD. set the Database property to SCDDemo. On the Output|Properties tab. 6. 3. Use the arrow buttons to move forward or backward through the tabs. Product dimension source The stage should now be configured to read the SCD. Figure 5. All rights reserved. The control is in the lower left corner of the editor. 5. set the Read method property to Table. On the Output|Properties tab. On the Output|Properties tab. set the Server property to DB2.ProdDim table.ProdDim table definition onto the output link.ProdDim. Load the Demo\DataStage\Slowly Changing Dimensions\TableDefs\SCD. On the Output|Properties tab.

Figure 6. Trademarks Page 13 of 32 . This information is used in a number of ways in the SCD processing. The choices for purpose codes are: Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. • Fast Path page 1: Setting the output link By default. All rights reserved. Product dimension SCD stage.ibm. Look at the link name that is displayed in the Select output link property. Use the Fast Path control to move directly to the tabs that are required to configure the stage. This results in a high number of property link-tab combinations. The other link automatically becomes the dimension update link. This is the primary output of the stage. Use the drop down list to select the output link that is leading to the next SCD stage. the first output link connected to the stage is used as the primary output link. Fast Path page 1 • Fast Path page 2: Define the lookup condition and purpose codes The first task on this page is to define what the various columns of the dimension table are used for.com/developerWorks developerWorks® The SCD stage has two input links and two output links.

Trademarks Page 14 of 32 . Click on the ProdSKU source field and drag it to the SKU dimension column to create the lookup condition. Data for this field is inserted into the table when a new row is inserted. All rights reserved. • SK Chain — This column is used to store the surrogate key of the previous or next record in the history for a particular business key. If the value has changed. Because this dimension table is tracking history. • Effective Date — This column is used to specify when a record first became the most current record. • Current Indicator — This column is used as a flag to indicate whether it is the most current record for a particular business key. • Type 2 — Check this column for a change in value.developerWorks® ibm. • Expiration Date — This column is used to specify the ending date of when a record was the active record. but is not the primary key of the dimension table. • Business Key — This column is the identifier of the business objects that the dimension table is representing. Set purpose codes for the columns as shown below in Figure 7. This column is typically used as a lookup column and corresponds to a key or some other field of the source data that identifies the associated business object. Fast Path page 2 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. but this column will not be checked for changes against the source data. this value is typically a future date or NULL.com/developerWorks • Surrogate Key — This column is the primary key of the dimension table and is populated with a surrogate key value. If the value has changed. • Type 1 — Check this column for a change in value. that is. perform an overwriting change to the dimension table. it contains columns to track whether a row is current and the date range for when it was current. Product dimension SCD stage. For currently active records. when it became the active record. Figure 7. The lookup is used to find the dimension table row that corresponds to a source data row. • (blank) — This column is not used for anything with respect to SCD processing. perform a history tracking change to the dimension table.

This tab specifies how surrogate keys are generated for this stage. • Fast Path page 3: Configuring the surrogate key generator Surrogate key generation capabilities are integrated into the SCD stage. This is the surrogate key state file you created by running the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. In this manner. This tutorial uses the file based method. All rights reserved. Trademarks Page 15 of 32 . multi-column lookup keys can be used.ibm. Surrogate key generation can use DataStage's file based surrogate generation. If more than one source column is associated with a dimension column. it is actually defining the lookup keys from the source record to the dimension record. Set the Source name property to C:\IBM\Demo\DataStage\SCD\SKG\ProdDim as shown in Figure 8. or use DB2 or Oracle database sequence object based generation.com/developerWorks developerWorks® Although this tab looks similar to a mapping tab. Any source column can be associated with any one dimension column. then those equality conditions are AND'ed together. This creates an equality lookup condition between those columns.

and are only available for Current Indicator and Expiration Date columns. Product dimension SCD stage. the derivation expressions are applied and a record is written on the dimension update link to indicate a new record needs to be added to the dimension table. Expire expressions are only enabled when there are Type2 columns specified. If a matching record is found. The Expire column is used to specify what values need to change if an existing record needs to be expired.com/developerWorks Generation\CreateAndUpdate_File job. Fast Path page 3 • Fast Path page 4: Defining the slowly changing dimension behavior and derivations The DimUpdate tab is used to define several critical elements of SCD processing. The Derivation column is used to specify how to map elements of a source row to elements of the dimension table.developerWorks® ibm. Trademarks Page 16 of 32 . Leave the defaults for the other properties unchanged. Figure 8. All rights reserved. and then the results are compared to Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. If no matching record is found when the lookup is performed. the derivation expressions are applied to the source columns.

two records are written on the dimension update link. This tab operates much like the Mapping tab of Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved. Columns specified as Type2 are compared first. one record is written on the dimension update link that indicates an update to the dimension table. Figure 9. • Fast Path page 5: Selecting the columns for Output Link The Output Map tab is used to define what columns will leave this stage on the primary output link. Product dimension SCD stage. The output columns for this link were automatically propagated with their purpose codes from the dimension input link. Set the Derivation expressions and the Expire expressions as shown below in Figure 9. however. If there is a change. The SCD stage only does this when the set of columns on the dimension update link is empty. Fast Path page 4 Note that you are specifying these properties on the dimension update link. they must exactly match those specified on the dimension input link.ibm. If there are any changes. If no Type2 columns have changed. to expire the matched row. the Type1 columns are compared. It is possible to load a set of columns directly on the dimension update link. Trademarks Page 17 of 32 . The derivation expressions are used to calculate the values for the update record. The first record is an update record. The second record is a new record that contains all of the new values for all columns. The Expire expressions are used to calculate the values for the update row.com/developerWorks developerWorks® the corresponding columns of the dimension table.

All rights reserved. Fast Path page 5 The stage is now configured to perform the dimension maintenance on the Product Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Because the product dimension has now been processed. Create and map the output columns by dragging and dropping from the source to the target. Select the columns for output as shown below in Figure 10. The output link is initially empty. the source columns that contain those attributes are no longer needed. Note that because the SCD processing has been done by the stage.developerWorks® ibm. The only difference is that you can select columns from the primary input link and columns from the reference link to output. the primary key associated with the source row is appended because that is the value that is required to be inserted into the fact table. Trademarks Page 18 of 32 . every record from the primary source data will have a corresponding record in the dimension. The columns coming from the reference link represent the values from the dimension table that correspond to the source row. Product dimension SCD stage. Instead. The columns coming from the primary source have the same values they entered the stage with.com/developerWorks other stages. Figure 10.

ProdDim. set the Upsert Mode property to Auto-generated Update and Insert. Auto-generated update and insert statements take the purpose codes specified in the SCD stage into account to generate the correct update statement for this usage. set the Server property to DB2. 4. Complete the following steps to configure the Product dimension update DB2 Enterprise stage: 1. 2. 3.ibm. On the Input|Properties tab. set the Write Method property to Upsert. set the Table property to SCD. All rights reserved. On the Input|Properties tab. On the Input|Properties tab. On the Input|Properties tab. a Upsert write method must be used. set the Database property to SCDDemo. Configure the Product dimension target stage This stage processes the dimension update link records produced by the product dimension SCD stage to update the actual dimension table in the database. 5. set the Use Default Database and Use Default Server to False.com/developerWorks developerWorks® dimension table. Trademarks Page 19 of 32 . On the Input|Properties tab. Product dimension target Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. 6. On the Input|Properties tab. Figure 11. Because incoming records represent both inserts and updates to the table.

5. On the Output|Properties tab. On the Output|Properties tab. set the Table property to SCD. On the Output|Properties tab. 2.StoreDim. On the Output|Properties tab. All rights reserved. . set the Database property to SCDDemo. set the Read Method property to Table. set the Server property to DB2.com/developerWorks The stage is now configured to write to the SCD. set the Use Default Database and Use Default Server to False. Configure the stages to process the Store dimension Configure the Store dimension source stage Complete the following steps to configure the Store dimension DB2 Enterprise stage: 1. On the Output|Properties tab. Load the Demo\DataStage\Slowly Changing Trademarks Page 20 of 32 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.ProdDim dimension table. 6.developerWorks® ibm. 4. 3.

StoreDim table definition onto the output link.com/developerWorks developerWorks® Dimensions\TableDefs\SCD. Configure the Store dimension SCD stage Open the store dimension SCD stage editor and use the Fast Path control to set the properties as shown: • Fast Path page 1: Setting the Output Link Use the Select output link drop down list to select the link leading to the fact table. Store dimension source stage The stage should now be configured to read the SCD. Use View Data to confirm that the data is being read from the database properly. Fast Path page 1 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. This is the primary output of the stage. Figure 12. All rights reserved. Store dimension SCD stage.ibm. Trademarks Page 21 of 32 . Figure 13.StoreDim table. The other link automatically becomes the dimension update link.

Trademarks Page 22 of 32 . Figure 14. All rights reserved. The Name column has a blank purpose code. Click on the StoreId source field and drag it to the dimension column Id to create the lookup condition. which indicates that this column will not be checked for changes. Fast Path page 2 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.developerWorks® ibm. Store dimension SCD stage. Because this dimension table is not tracking history. it does not contain columns to track whether a row is current or not.com/developerWorks • Fast Path page 2: Define the lookup condition and purpose codes Set purpose codes for the columns as shown below in Figure 14.

Leave the defaults for the other properties. Store dimension SCD stage. This is the surrogate key state file you created by running the Demo\DataStage\Slowly Changing Dimensions\Surrogate Key Generation\CreateAndUpdate_File job.ibm. Because the Name column has no purpose code. Figure 15. the SCD stage does not check this Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. All rights reserved. Trademarks Page 23 of 32 .com/developerWorks developerWorks® • Fast Path page 3: Configuring the surrogate key generator Set the file path property to C:\IBM\Demo\DataStage\SCD\SKG\StoreDim as shown in Figure 15. Fast Path page 3 • Fast Path page 4: Defining the slowly changing dimension behavior and derivations Set the Derivation expressions as shown below in Figure 16.

the Expire expression is not enabled for any column. Fast Path page 4 • Fast Path page 5: Selecting the columns for Output Link Select the columns for output as shown below in Figure 17. All rights reserved. Trademarks Page 24 of 32 . Instead. Because the store dimension has now been processed. the source columns that contain those attributes are no longer needed. Figure 17. Store dimension SCD stage.developerWorks® ibm. Figure 16. Store dimension SCD stage. the surrogate key associated with the source row is appended because that is the value that is required to be inserted into the fact table.com/developerWorks column for changes when a matching dimension record is found on the lookup. Fast Path page 5 Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Because there are no Type2 columns in this dimension table.

On the Input|Properties tab. set the Table property to SCD. Configure the Store dimension target stage This stage processes the dimension update records produced by the store dimension SCD stage to update the actual dimension table in the database. 3. On the Input|Properties tab. set the Database property to SCDDemo. set the Upsert Mode property to Auto-generated Update and Insert. On the Input|Properties tab. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.com/developerWorks developerWorks® The stage is now configured to perform the dimension maintenance on the store dimension table. 2. set the Write method property to Upsert. Trademarks Page 25 of 32 . 5. Complete the following steps to configure the Store dimension target DB2 Enterprise stage: 1. On the Input|Properties tab.StoreDim. All rights reserved. set the Use Default Database and Use Default Server to False.ibm. On the Input|Properties tab. 4.

Configure the Fact table target stage This stage processes the source records that have been passed through the primary output links to update the actual fact table in the database. At this point.StoreDim dimension table. set the Write Method property to Write.developerWorks® ibm. On the Input|Properties tab. set the Table property to SCD.com/developerWorks 6. All rights reserved. Trademarks Page 26 of 32 . Complete the following steps to configure the Fact table target DB2 Enterprise stage: 1. On the Input|Properties tab.Facttbl. the original input source records have been processed so that the only columns on this link are the measurements (SaleAmt and SaleUnits) and the surrogate key values for the associated Product and Store. set the Write Mode property to Append. Figure 18. 2. On the Input|Properties tab. Store dimension target stage The stage is now configured to write to the SCD. set the Server property to DB2. 3. On the Input|Properties tab. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.

If any compile errors occur. On the Input|Properties tab. The Resources page contains a link to an article in the information center for IBM Information Server with details on configuring your environment correctly for your C++ compiler. Fact table target stage The stage is now configured to write to the SCD. set the Server property to DB2. set the Database property to SCDDemo. See the Information Server Configuration Guide for details on how to configure the environment correctly for your C++ compiler.Facttbl dimension table. On the Input|Properties tab. check your job and stages against the settings specified in the tutorial and make any necessary changes. the C++ compiler settings for the project must be correct.ibm.com/developerWorks developerWorks® 4. Click the Compile button to start the compile. So for the job to compile successfully. Trademarks Page 27 of 32 . 5. Final steps You have now completed the job design and are ready to compile. All rights reserved. Note that the SCD stage processing makes use of the transform operator. set the Use Default Database and Use Default Server to False. On the Input|Properties tab. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. 6. Figure 19.

After the job finishes successfully. and four new records. you are now ready to compile and run the job. and two existing records had Type2 changes. the store dimensions. run the Results shortcut again to see the changes that were made to the database tables. resulting in the two updates and two of the new records. Run the Results executable shortcut in the C:\IBM\Demo\DataStage\SCD directory to see the initial contents of the database tables. The Results shortcut displays the contents of the product dimension. Two of the new records are new objects to the dimension table. Running the tutorial At this point. All rights reserved.com/developerWorks Section 5. Summary of changes to database tables The contents of the database tables should now appear as follows: • The product dimension has two update records. Run the job by clicking the Run button in the DataStage Designer. Change ProdSK SKU No 1 Change Expired 2 (Type2) Expired 10 (Type2) New 3 Record New 4 Record New 5 Record (Type2) Brand Descr Curr EffDate ExpDate 2004-01-01 2099-12-31 2004-01-01 {Today's Date} 2004-01-01 {Today's Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} {Today's2099-12-31 Date} 3333333333 Sunshine Yellow Y Duckie 4444444444 AAAAA spoon 5555555555 AAAAA grass cutter 1111111111 Bob's Red Box N N Y Y Y 2222222222 SqueakyBlue Chair 4444444444 AAAAA fork New 6 5555555555 Best Record(Type2) lawn Y mower Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. and the fact table. Trademarks Page 28 of 32 .developerWorks® ibm.

The surrogate key values in this table correspond to the current records in the dimension tables. and two new records.38 308.40 SaleUnits 13 14 7 2 11 The contents of the dimension tables have now changed.com/developerWorks developerWorks® • The store dimension has one updated record.ibm. Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.56 203. Conclusion You can use the Slowly Changing Dimension stage to greatly reduce the time you spend creating jobs for processing star schemas. ProdSK 3 4 1 5 6 StoreSK 3 4 1 2 5 SaleAmt 436. In this tutorial you have learned how to configure the Slowly Changing Dimension stage to process history-tracking changes and in-place changes to dimension tables. one for each source record processed. The updated record had a Type1 change and the two new records are new objects to the dimension table. You have also seen how you can reduce fact table processing by augmenting the source data with associated dimension table surrogate keys that eliminate the need for an additional lookup. To reset the database tables to their original state. Trademarks Page 29 of 32 . run the zReset executable shortcut . This completes the Slowly Changing Dimensions tutorial. what results would you expect to see? Hint: The dimension tables and the source file are now in-sync.87 24.14 456. If you were to run the job again. All rights reserved. Change No Change Update No Change New Record New Record StoreSK 1 2 5 3 4 ID A1113 A1114 A1115 A1111 A1112 Name Stuffy's McStuff Lil Stuff Stuff MoreStuff Mgr Jefferson Madison Monroe Washington Adams • The fact table has five new records.

All rights reserved.com/developerWorks Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. Trademarks Page 30 of 32 .developerWorks® ibm.

All rights reserved.zip tutorial Information about download methods Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009.ibm. Trademarks Page 31 of 32 .com/developerWorks developerWorks® Downloads Description Name Size 16KB Download method HTTP Supporting scripts and DS jobs for this SCD_Tutorial.

Manage dimension tables in InfoSphere Information Server DataStage © Copyright IBM Corporation 2009. • Check out developerWorks blogs and get involved in the developerWorks community. Get products and technologies • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®. get the resources you need to advance your InfoSphere product skills.developerWorks® ibm. Tivoli®. Trademarks Page 32 of 32 . • C++ compiler for job development topic in the information center for IBM Information Server. About the author Brian Caufield Brian Caufield is a software architect in IBM Silicon Valley Lab. Rational®. Discuss • Participate in the discussion forum for this content.com/developerWorks Resources Learn • In the InfoSphere area on developerWorks. and WebSphere®. Lotus®. • Browse the technology bookstore for books on these and other technical topics. Brian has been working in the DataStage development organization for 10 years and was involved in the design of the Slowly Changing Dimension Stage. All rights reserved.

Sign up to vote on this title
UsefulNot useful