Reporting the difference rows between two sources using Informatica http://nelrickrodrigues-informatica.blogspot.

com
The purpose of the mappings discussed below is to report the difference rows between two sources in different scenarios.

Scenario 1: When there are difference rows in one of the sources and both the sources are either flat files or a flat file and a relational table. To illustrate this scenario, the two sources considered are two comma separated flat files EMPLOYEE_FILE_1.txt and EMPLOYEE_FILE_2.txt. The EMPLOYEE_FILE_2.txt has some extra records that need to be reported or loaded into the target which is a relational table having the same definition as the sources. The difference rows between the two flat files are highlighted in red as shown below.

Import the two flat file definitions using the Source Analyzer in Informatica PowerCenter Designer client tool as shown below.

Create a new mapping in the Mapping Designer and drag the two source definitions into the workspace. Now, to identify the extra records from EMPLOYEE_FILE_2, a Joiner transformation followed by a Filter transformation is used.

the source with fewer duplicate key values is assigned as the "Master" source. to improve performance for an unsorted joiner transformation. any one of the sources can be treated as the "Master" source. Notice the ports created in the joiner transformation are designated as the "Detail" source by default. . In practice. however. First drag all the ports from the source qualifier SQ_EMPLOYEE_FILE_2 into the joiner transformation. Here the EMPLOYEE_FILE_1 source is designated as the "Master" source. Drag only the EMP_ID port from SQ_EMPLOYEE_FILE_1 into the joiner transformation as shown below. the source with fewer rows is treated as the "Master" source while for a sorted joiner transformation.The illustration discussed below uses an unsorted Joiner Transformation and since both the sources are having few records.

Double-click the joiner transformation to open up the Edit view of the transformation. . Click on the Condition tab and specify the condition shown below.

the Join Type used is "Master Outer Join" in the Properties tab as shown below. . we need to pass all the rows from the EMPLOYEE_FILE_2. The Join types supported in the joiner transformation are described below.Since. This will ensure that all the rows from the "Detail" source and only the matching rows from the "Master" source will pass from the joiner transformation.

The rows passed by the joiner transformation are shown below. Now. For the missing rows in the "Master" source. the EMP_ID1 value is NULL. a filter transformation can be used ahead to pass only the records having EMP_ID1 as NULL since these rows correspond to the difference rows from EMPLOYEE_FILE_2 source. Pass all the rows from the joiner transformation into the filter transformation and add the Filter condition shown below in the Properties tab of the filter transformation. .

. Link the EMP_ID.The records passed by the filter transformation are shown below. EMP_NAME and CITY ports to the target definition. After running the workflow. Create a session task and a workflow. the difference rows loaded into the target relational table are shown below. The complete mapping is shown below.

The target relational table has an additional column SOURCE_NAME added. All the ports from the source qualifier SQ_EMPLOYEE_FILE_1 are passed to the joiner transformation as shown below because the difference records are present in both the sources. . which indicates the source name where the difference row is present. For this illustration. The difference rows in both the flat files are highlighted in red. The EMPLOYEE_FILE_1 is treated as the "Master" source again for the unsorted joiner transformation.Scenario 2: When there are difference rows in both the sources and both the sources are either flat files or a flat file and a relational table. we consider both the sources are flat files.

The rows passed by the joiner transformation are shown below. . but for this case. the EMP_ID1. EMP_NAME1 and CITY1 values are NULL. For the missing rows in the "Master" source. the Join Type is Full Outer Join as shown below. EMP_NAME and CITY values are NULL. the EMP_ID.The join condition for the joiner transformation is the same as the first scenario. while for the missing rows in the "Detail" source. The filter transformation shown below should only pass the rows having NULL values in EMP_ID or EMP_ID1 ports as these correspond to the difference rows in the EMPLOYEE_FILE_2 and EMPLOYEE_FILE_1 flat files respectively.

.The filter condition used in the filter transformation is shown below.

The filter transformation passes the following rows. The logic for the output port EMP_ID_OUT is that if the EMP_ID value from EMPLOYEE_FILE_2 source is NULL. pass the EMP_ID1 value from the EMPLOYEE_FILE_1 source. This logic works because for any row passed from the filter transformation. A similar logic is applied for the EMP_NAME_OUT and CITY_OUT output ports. add an expression transformation after the filter transformation. Now. . either the row from the "Master" source will have NULL values or the row from the "Detail" source will have NULL values. else return the EMP_ID value from EMPLOYEE_FILE_2 source. Pass all the ports from the filter transformation to the expression transformation. The expression transformation should have the following ports in the order shown below.

the difference rows loaded into the target relational table are shown below.Another output port SOURCE_NAME_OUT is used to determine the source of the difference row. Create a session task and a workflow. After running the workflow. . The expression used for this port is shown below. The complete mapping is shown below.

. The rows present in both the tables are shown below and the difference rows are highlighted in red. the EMPLOYEE_TABLE_1 source definition is imported.Scenario 3: When there are difference rows in two relational tables residing in the same database. Import the table definition of any one source in the Source Analyzer. two relational tables EMPLOYEE_TABLE_1 and EMPLOYEE_TABLE_2 having the same definition are considered. Here. For this illustration.

EMP_ID = EMP_2.EMP_ID = EMP_2. CITY FROM EMPLOYEE_TABLE_1 EMP_1 WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_2 EMP_2 WHERE EMP_1.EMP_ID) The above query can be modified to return the difference rows between the two source tables and also the table name where the difference row is present. EMP_NAME. SELECT EMP_ID.EMP_ID) UNION SELECT EMP_ID. CITY. CITY. 'EMPLOYEE_TABLE_1' AS SOURCE_NAME FROM EMPLOYEE_TABLE_1 EMP_1 WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_2 EMP_2 WHERE EMP_1. EMP_NAME.EMP_ID) .A simple SQL query that returns the difference rows in EMPLOYEE_TABLE_1 is given below. EMP_NAME. 'EMPLOYEE_TABLE_2' AS SOURCE_NAME FROM EMPLOYEE_TABLE_2 EMP_2 WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_1 EMP_1 WHERE EMP_1. SELECT EMP_ID. The alias column 'SOURCE_NAME' gives the source table name of the difference row.EMP_ID = EMP_2.

Add the above query to the source qualifier transformation in the Sql Query attribute value as shown below. . Ensure that the order of the columns in the SQL query match the order of the ports in the Source Qualifier. This will override the default SQL query issued when the session runs.

The alias column 'SOURCE_NAME' value also needs to be passed to the target. the datatype varchar2 in the source definition changes to string in the source qualifier. the session will fail with an error . The Source Qualifier [SQ_EMPLOYEE_TABLE_1] contains an unbound field [SOURCE_NAME]. use an expression transformation in between the source qualifier and the target definition as shown below. a new port SOURCE_NAME is created in the source qualifier as shown below.TE_7020 Internal error. the new port SOURCE_NAME that is created should be linked to a field from the source definition having the same datatype i. all ports that are in the source qualifier are input/output ports.e. Link the CITY port from the source definition to the SOURCE_NAME port in the source qualifier. As a good practice. If the port SOURCE_NAME is not linked. Hence. . For this purpose. By default.

Create a session task and a workflow. . the difference rows loaded into the target relational table are shown below. After running the workflow.

Sign up to vote on this title
UsefulNot useful