Professional Documents
Culture Documents
Informatica SCD Type-2 Implementation: Natural Key Surrogate Keys
Informatica SCD Type-2 Implementation: Natural Key Surrogate Keys
The Type 2 method tracks historical data by creating multiple records for a given natural key in the
dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we
have unlimited history preservation as a new record is inserted each time a change is made. Type2 can
be achieved in different ways.
How to implement SCD Type2 through Informatica:
There is number of way to implement SCD Type2 in informatica.
Example:
Create source table CUST with below query.
CREATE TABLE CUST
(
CUST_ID NUMBER,
CUST_NM VARCHAR2(250),
ADDRESS VARCHAR2(250),
CITY VARCHAR2(50),
STATE VARCHAR2(50),
INSERT_DT DATE,
UPDATE_DT DATE);
ADDRESS
100 Main St.
510 Broadway
Ave.
555 6th Ave.
CITY
Bangalore
STATE
KA
INSER_DT UPDATE_DT
1/7/2011
1/7/2011
1/7/2011
1/7/2011
1/7/2011
1/7/2011
Hyderabad AP
Bangalore
KA
The Type 2 method tracks historical data by creating multiple records for a given natural key (CUST_ID)
in the dimensional tables with separate surrogate keys (PM_PRIMARYKEY).
For this create table with below query
Flow
Flow
lkp_CUST_D
CUST
SQ_CUST
exp_FLAG
rtr_INS_UPD
upd_INSERT
CUST_D_INS
upd_UPDATE
CUST_D_UPD
Before implementing we need to identify the attributes needs to be consider for history maintain.
In this example we will consider if any change in ADDRESS or CITY OR STATE.
So if any change in ADDRESS or CITY or STATE then we need to insert new record and inactivate old
record.
Creation of mapping:
Step1: First import source and target to informatica from data base.
Import the source definition CUST table using the Source Analyzer workspace. Go to Sources > Import
from Database.
This opens the Import Tables window. Assuming that a system DSN is already created for this
connection, specify all the necessary details and click Connect.
The CUST source definition is created and appears in the workspace. Click Save to save the source
definition in the repository.
The source table CUST contains only current data and doesn't have any historical data. This mapping
would be run daily to capture the historical data in the CUST_D target table. The Active and Inactive
Date logic would be used for SCD Type 2 mapping.
Follow the same steps using target Designer to import CUST_D table.
Now we have source CUST and target CUST_D tables are available.
Mapping creation:
Create mapping with name m_SCD_Type_2.
Create one input port in lkp_CUST_D table with name in_CUST_D with data type double. And add
condition CUST_ID=in_CUST_ID
Create expression transformation and drag CUST_ID, CUST_NM, ADDRESS, CITY, STATE from Source
qualifier to expression. In the same why drag CUST_NM, ADDRESS, CITY, STATE from lkp_CUST_D table
to expression. Change attributes names in expression to identify source and lookup attributes as shown
in diagram.
In lookup transformation apply filter to retrieve only active records. This you can do it in lookup sql
override.
SELECT CUST_D.PM_PRIMARYKEY as PM_PRIMARYKEY,
CUST_D.CUST_NM as CUST_NM,
CUST_D.ADDRESS as ADDRESS,
CUST_D.CITY as CITY,
CUST_D.STATE as STATE,
CUST_D.ACTIVE_DT as ACTIVE_DT,
CUST_D.INACTIVE_DT as INACTIVE_DT,
CUST_D.INSERT_DT as INSERT_DT,
CUST_D.UPDATE_DT as UPDATE_DT,
CUST_D.CUST_ID as CUST_ID
FROM CUST_D
WHERE INACTIVE_DT IS NOT NULL
From the above condition if any record flagged as 1 that means it new record which is coming
from source and this is not available in target.
If any record flagged as 2 that means its already exists in target and there is a difference in
attributes from source and target.
If any record flagged as 3 that means this record present in source and target, there is no
difference between attributes.
Now create router transformation and drag the following attributes from expression to router
transformation.
Lkp_PM_PRIMARYKEY, src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE,
out_DUMMY_DT and out_FLAG.
Create two groups in router one for insert and another one for update. In insert group will pass
both new and changed records for insert.
Update group only to pass the record which we need to inactivate records.
Now connect INSERT group from router to Target. Connect the flowing attributes from INSERT group to
Target src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE,out_DUMMAY_DT to respective
fields in target. Connect out_DUMMY_DT field from router to INSERT_DT, UPDATE_DT and ACTIVE_DT
attributes.
Create sequence transformation and connect nextval from seq transformation to target
PM_PRIMARYKEY attribute in INSERT pipe line as shown in below screen.
Now drag lkp_PM_PRIMARYKEY, out_DUMMAY_DT from UPDATE group of router transformation and
connect to update strategy transformation.
Create work flow with session for this mapping and assign source and target relational connections.
ADDRESS
CITY
Bangalore
KA
7/1/2011
7/1/2011
Hyderabad AP
Bangalore KA
7/1/2011
7/1/2011
7/1/2011
7/1/2011
Assuming this is the first time we are running mapping so there wont be any data in target.
If you are running job on 2nd Jul data in target looks like below
So for all this records inactive date is null. Means all are active records.
After this run assume that data changed in source on 2nd. Changed data in source looks like below.
CUST_ID CUST_NM
Marion
80001 Atkins
80002 Laura Jones
80003 Jon Freeman
80004 Veeru
ADDRESS
CITY
Bangalore
KA
7/1/2011
7/1/2011
Hyderabad AP
Hyderabad AP
Bangalore KA
7/1/2011
7/1/2011
7/2/2011
7/1/2011
7/2/2011
7/2/2011
In the above data for first two records there are no changes after last refresh so there is now change in
update date. But for record with CUST_ID 80003 CITY, STATE changed from previous day to today. So
update date changed from 1St Jul to 2nd Jul.
If you run same job on 3rd Jul, then target data looks like below.
In target today two records get inserted. One is new record which is with CUST_ID 80004. And another
record which is changed record 80003.
So in target for CUST_ID 80003 we have two records one is inactivated (PM_PRIMARYKEY=3) and
another one is active record (PM_PRIMARYKEY=5). From this you can identify for over period of time
what is the active record for particular customer.
Hope this will help you to understand SCD Type 2 logic implementation in Informatica.