Informatica- Complex Scenarios and their Solutions

Author(s) • • • Aatish Thulasee Das Rohan Vaishampayan Vishal Raj

Date written(MM/DD/YY): 07/01/2003 Declaration
We hereby declare that this document is based on our personal experiences and / or experiences of our project members. To the best of our knowledge, this document does not contain any material that infringes the copyrights of any other individual or organization including the customers of Infosys. Aatish Thulasee Das, Rohan Vaishampayan, Vishal Raj

Project Details
• • • • •
Projects involved: REYREY H/W Platform: 516 RAM, Microsoft Windows 2000 S/W Environment: Informatica Appln. Type: ETL tool Project Type : Dataware housing

Target readers: Datawarehousing team using ETL tools Keywords
ETL Tools, Informatica, Dataware Housing

INDEX INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS...........................1

Author(s)........................................................................................................................1 Aatish Thulasee Das.......................................................................................................1 Rohan Vaishampayan.....................................................................................................1 Vishal Raj.......................................................................................................................1 Date written(MM/DD/YY): 07/01/2003........................................................................1 Declaration.....................................................................................................................1 Project Details ...............................................................................................................1 Target readers: Datawarehousing team using ETL tools...............................................1 Keywords ......................................................................................................................1
INTRODUCTION......................................................................................................................3 SCENARIOS:..............................................................................................................................3 1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE SOURCES AND TARGETS.........................................................................................................3

................................................................................................................................................................................................. Regarding the Size of the fields we changed the size to the maximum possible size for example as mentioned..........................................................3 1..... Each session will establish it’s own connection and the server can handle all the requests in parallel against the multiple targets.............2 Problem Scenario...........6 4.............................................................................2 Problem Scenario:....4 Below is the sample Flat File which was used during the project.....3 2. Also all the related mappings can be executed in parallel in different sessions...........................12 .......4 Fig 2.....................................................................2: In_Daily .......6 ............. multiple source and Targets) in to simple mappings with one source and one target................................................................................ Since the data was so heterogeneous we decided to keep all the data types in the source qualifier as “String” and changed them as per the fields in which they were getting mapped.......................................6 3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD SETS................................................e......................4 ................................................................................................................................................................................Flat File after loading into Informatica...........................3 1..........................4 ......................................................1 Background............................ WHEN SOURCE DATA IS FLAT FILE...................................................................4 2.................................................................................8 5 COMPLEX LOGIC FOR SEQUENCE GENERATION:.................................4 2.....................3 Solution............................................................................................. TOO LARGE LOOKUP TABLES:....................................................... That will greatly help in managing the mappings..........................5 Two Issues which were encountered during loading the above shown flat files are as following:......5 2............3 Solution:........4 A Flat file is one in which table data is gathered in lines of ASCII text with the value from each table cell separated by a delimiter or space and each row represented with a new line......................................................... Each session can be placed in to the Batch and run in ‘CONCURRENT’ mode..................................................................1: In_Daily ........................................................ It is always better to divide the Complex mapping (i...........................................................................4 When the above flat file was loaded into Informatica the Source analyzer was like shown below...................................................................................................................5 Following is the solution which was incorporated by us to solve the above problem ..........................................................5 1.............1 Background...5 2.........................................................................1....................................................................................................................3 Divide and Rule......Flat File..........................................................................................5 Fig 2.....................................................................................4 What is a Flat File?.................................................................................................

In this scenario the single database connection is handling multiple database statements. multiple source and Targets) in to simple mappings with one source and one target. Also all the relative loading takes place in one go. This reduces the creation of multiple sessions. Each session can be placed in to the Batch and run in ‘CONCURRENT’ mode.Introduction This Document is based upon learning that we had during the work on project ‘Reynolds and Reynolds’ in CAPS (PCC). This Document also tells about some common best practices to follow while developing the mappings. Also it is difficult to manage the mapping. This property is quite useful to map the relative mappings at one place. 1. if there are some complex transformations in some of the sub mappings then the performance is degraded drastically. It is quite logical to group the different sources and targets in same mapping that contains the same logic.1 Background In Informatica. Pune. 1. Scenarios: 1. That will greatly help in managing the mappings. We have come up with the Best Practices to overcome the complex scenarios we faced during the ETL process. Also all the related mappings can be executed in parallel in different sessions. Performance problems when a mapping contains multiple sources and Targets. It is always better to divide the Complex mapping (i.3 Solution: Divide and Rule.2 Problem Scenario: In the multiple target scenarios. multiple sources can be mapped with the multiple targets. For example if there is performance problem due to one of the sub mapping then other sub mapping will also suffer the performance degradation. Each session will establish it’s own connection and the server can handle all the requests in parallel against the multiple targets. .e. 1.

Fig 2.2 Problem Scenario When the above flat file was loaded into Informatica the Source analyzer was like shown below . When source data is Flat File 2. 2.Flat File.1 Background What is a Flat File? A Flat file is one in which table data is gathered in lines of ASCII text with the value from each table cell separated by a delimiter or space and each row represented with a new line.1: In_Daily .2. Below is the sample Flat File which was used during the project.

For example refer to Fig 2. .Fig 2.1 the Eighth row i. For example refer to Fig 2.3 Solution Following is the solution which was incorporated by us to solve the above problem 1. Size of the fields from the flat file and the respective fields from Target tables were not matching.2: In_Daily . record corresponding to “BH” the Fourth field is having its Data type as “Date” also refer the Third row i.1 in the First row i.2) also the fifth field corresponding to “CR” is 5 and in the target table the corresponding field was having size equal to 100. Data types of the fields from the flat file and the respective fields from Target tables were not matching. 2.e.e.Flat File after loading into Informatica. record corresponding to “QR” the fifth field is having its Field size as 100 but after the loading process the source analyzer showed the size of the field equal to 45 (as shown in the Fig 2. Two Issues which were encountered during loading the above shown flat files are as following: 1.e. 2. field corresponding to “CR” the fourth field is “Char” and in the target table the corresponding field was having data type as “Char”. Since the data was so heterogeneous we decided to keep all the data types in the source qualifier as “String” and changed them as per the fields in which they were getting mapped.

.1.1 Background: The Flat file shown in the previous section (fig 2. To explain the nested formation of the record of the above file is restructured in the Fig 3. 3.2. Regarding the Size of the fields we changed the size to the maximum possible size for example as mentioned 3 Extracting data from the flat file containing nested record sets.2) contains the nested record set.

The third level of data is containing information of different activities for a particular dealer. the dealer information data (Second Level data) should be stored into variables by putting the condition that satisfies the dealer information. Both the data required to be concatenated to form single information to load in a single row of target table.Level 3 Level 2 Level 1 Fig 3.1: In_Daily .3 Solution: In this particular kind of scenario. 2nd and 14th rows in the flat file shown above) contain the different dealer details and the Third level of data contains the different activity details for dealers. 3. But only the second level data (i. So. Here the data is in 3 levels. The second level of data is containing Dealer records in the batch file. This row should be filtered in the next transformation. 3. starting with record DH and ending with DT.2 Problem Scenario: The data required for loading was in the form such that a single row should consist of dealer detail as well as different activities done by the particular dealer.Flat File restructured in the Nested form. for that . First level of data is containing the Batch File information starting with record BH and ending with BT record.e.

view.1).2).1 Background: What is a Lookup Transformation? A Lookup transformation is used in your mapping to look up data in a relational table. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Too Large Lookup Tables: 4. Use the result of the lookup to pass to other transformations and the target.particular row of flat file (i.e. dealer information) the data is stored in the variables. Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. The Informatica Server queries the lookup table based on the lookup ports in the transformation (See Fig 4. row should be passed to next transformation with the Dealer Information that was stored in the variable during previous row load. or synonym (See 4. . And for the dealer’s activity data (Third Level Data). The same is done here: 4. Multiple Lookup transformations can be used in a mapping.

if your source table includes employee ID. For example. Perform a calculation. including: • • • Get a related value. Many normalized tables include values used in a calculation. but not the calculated value (such as net sales). such as gross sales per invoice or sales tax. .You can use the Lookup transformation to perform many tasks.1: LOOKUP is a kind of Transformation. Update slowly changing dimension tables.) Fig 4. but you want to include the employee name in your target table to make your summary data easier to read. (The actual screens are attached for reference. You can use a Lookup transformation to determine whether records already exist in the target.

3) This also ensured that Cache memory would not be wasted unnecessarily and could be used for other tasks. 4.Lookup Conditions Fig 4.3 Our Solution: We eliminated the first problem by simply using the lookup table as one of the source table itself.2: The Lookup Conditions to be specified in order to get Lookup Values. Thus the loading of data from source table(s) to the target table(s) was unnecessarily consuming more time than it should normally do. 4. . (See Fig 4.2 Problem Scenario: In the project one of the mappings had large lookup tables that were hampering the performance of the mapping as a. More time was spent in searching for relatively less number of values from a large lookup table. The source tables & target tables are not cached in Informatica and hence it made sense to use the large lookup table as a source. They were consuming a lot of cache memory unnecessarily and b.

.3: The Mapping showing the use of Lookup table as a Source table.Multiple Source Tables Joined in the Source Qualifier Source Qualifier Fig 4.

Thus the second problem was also successfully eliminated.SQL to join the tables User Defined Join Fig 3.1 Background: What is a Sequence Generator? A sequence generator is transformation that generates a sequence of numbers once you specify a starting value (see Fig 2. This reduced the searching time that was taken by Informatica as the numbers of rows to be searched were drastically reduced since the join condition takes care of the excess rows which would otherwise have been there in the Lookup transformation. After using the lookup table as a source we used a joiner condition in the Source Qualifier. (The actual screens are attached for reference. 5 Complex logic for Sequence Generation: 5.4: The use of Join condition in the Source Qualifier.2) and the increment by which to increment this starting value.) .

Fig 5.1: The Sequence Generator is a kind of Transformation. .

. Thus the generated sequence won’t be continuous and there would be gaps or holes in the sequence.2: The Transformation details to be filled in order to generate a sequence. But as per it’s property. For e.1) would be increment by 1. 5. So whenever the lookup condition returned a value that value would populate the Target Table but at the same time the Sequence Generator would also trigger and hence increment by 1 so its CURRVAL (current value. Another requirement was that the sequences of numbers generated by the Sequence Generator were required to be in order. a.. every time a row gets loaded into the Target Table the sequence generator is triggered.2 Problem Scenario: In the project one of the mappings had two requirements viz. During the transfer of Data to a column of a Target Table the Sequence Generator was required to trigger only selectively. So when the next value is loaded in the column of the target table the difference between the sequence generated values would be 2 instead of 1.: The values that were to be loaded in the column of the target table were either sequence generated or obtained from a lookup table. b.g.Fig 5. see Fig 5.

5. If the lookup table returns a null value then a row would get populated in the first instance of the target table and in this case the sequence generator would trigger and its value would get loaded in the column of the Target Table. And all the other values for the remaining columns in the Target Table were filtered on the basis of the value returned from the Lookup Table i.e. In order to prevent the sequence generator from triggering we created two instances of the same target table.2 Our Solution: A basic rule for the Sequence Generator is that if a row gets loaded into the Target table the sequence generator gets triggered.3) whereas the value returned from the Lookup Table (if any) was mapped to the same column in the Target table in the second instance (See Fig5.3).3) Sequence Generator Lookup Table Target Table (Second Instance) Target Table (First Instance) Fig 5. (See Fig 5. . if the lookup table returned a value then a row in the second instance of the target table would get populated and thus the sequence generator wont be triggered.3: The Mapping showing two instances of the same Target table. The sequence generator was mapped to the column in the Target Table in the first instance (See Fig 5.

.Thus by achieving control over the triggering of the sequence generator we could avoid the “holes” or gaps in the sequence generated by the Sequence generator.