INFORMATICA

COOKBOOK

INFORMATICA DEVELOPER’S GUIDE

Author Creation Date Last Modified Version

: : : :

Sastry Kolluru 1.00

Approvals Stephen Musgrove

: :

Informatica Cookbook

Change Record
DATE 19-Apr-2004 Author Sastry Kolluru Version 1.00 Reference Added section 7.7 and 7.8

Reviewers
NAME POSITION

Version 1.00

Page 2 of 19

Informatica Cookbook

Table of Contents
1.0OVERVIEW...............................................................................................................................................5 2.0GETTING STARTED...............................................................................................................................5 2.1 ABOUT INFORMATICA...................................................................................................................................5 2.1.1 Version in use..................................................................................................................................5 3.0INFORMATICA DEVELOPMENT CYCLE.........................................................................................5 3.1 STARTING A NEW PROJECT...........................................................................................................................5 3.1.1 Project Initialization........................................................................................................................5 3.1.2 Login................................................................................................................................................6 3.1.3 Folders and Groups setup...............................................................................................................6 3.2 DEVELOPMENT AND TESTING PROCESS...........................................................................................................6 3.3 MIGRATION TO PRODUCTION.........................................................................................................................6 3.3.1 Information to be provided..............................................................................................................6 3.3.2 Review before movementhallenge........................................................................................................................................9 7.1.2 Description......................................................................................................................................9 7.2 TEMPLATES...............................................................................................................................................12 7.2.1 Challenge......................................................................................................................................12 7.2.2 Description....................................................................................................................................12 7.3 USAGE OF CONNECTION OBJECTS................................................................................................................14 7.3.1 Challenge......................................................................................................................................14 7.3.2 Description....................................................................................................................................14 7.4 FAILURE SCRIPTS......................................................................................................................................15 7.4.1 Challenge......................................................................................................................................15 7.4.2 Description....................................................................................................................................15 7.5 TRUNCATING DATA.....................................................................................................................................15 7.5.1 Challenge......................................................................................................................................15 7.5.2 Description....................................................................................................................................15 7.6 BUILT-IN RE-STARTABILITY..........................................................................................................................16 7.6.1 Challenge......................................................................................................................................16 7.6.2 Description....................................................................................................................................16 7.7 PROJECT DIRECTORY STRUCTURE IN UNIX......................................................................................................17 7.7.1 Challenge......................................................................................................................................17 7.7.2 Description....................................................................................................................................17 7.8 PARAMETERIZATION OF SESSION INFORMATION................................................................................................17

Version 1.00

Page 3 of 19

Informatica Cookbook

7.8.1 Challenge .....................................................................................................................................17 7.8.2 Description....................................................................................................................................17

Version 1.00

Page 4 of 19

Informatica Cookbook

1.0 OVERVIEW
The objective of the Informatica Cookbook is to provide the Informatica user community at Fidelity Investments information regarding • Informatica infrastructure at FEB • Processes for the development life cycle • Best practices/ tips and techniques The cookbook hopes to be a starting point for developers so that they can understand standards/processes and best practices before starting work on the FEB Informatica infrastructure. It also will act as a refresher for experienced developers for best practices and learning’s from other users. We hope to update this document on a regular basis to incorporate better practices and learning’s.

2.0 GETTING STARTED

2.1 ABOUT INFORMATICA
Informatica PowerCenter is a data integration platform for building, deploying, and managing enterprise data warehouses, and other data integration projects. Informatica PowerCenter enables users to easily transform data from disparate enterprise systems and sources into reliable information to support strategic business initiatives.

2.1.1 Version in use
The versions of Informatica currently running are 5.1 and 6.2. All new developments should be done in version 6.2. All projects currently in Informatica 5.1 will be migrated to Informatica 6.2

3.0 INFORMATICA DEVELOPMENT CYCLE

3.1 STARTING

A

NEW PROJECT

3.1.1 Project Initialization
A mail has to be sent to the Informatica Support team before the start of any project. The mail should contain the following information. 1. Project Name 2. Project Contact 3. Folder Name

Version 1.00

Page 5 of 19

Informatica Cookbook

4. 5. 6. 7.

List of users accessing the folder Informatica version planned to be used Expected date of moving to Production Expected number of sessions/mappings in the project

A minimum of 5 days notice should be given for code to be moved to production to help plan the same.

3.1.2 Login

Every user should have a login into development as well as production. The Corp id will be used as login for individual users. In development users will be given access to both create and execute mappings/sessions whereas in production only read access will be given. The request for creating a new login may come as a part of the project initialization mail or a separate mail maybe sent to the Informatica Support Group. A selective execute privilege can be requested for some sessions or workflows.

3.1.3 Folders and Groups setup
Folders will be created based on the information provided to the Informatica Support Group as a part of the project initialization process. Groups will be setup in Informatica to manage access of users to various folders.

3.2 DEVELOPMENT

AND

TESTING PROCESS

All development should be done in the development instance of Informatica and Oracle. Separate folders should be created in the same development repository for development, QA and SIT. The folders marked as <folder_name>_prod will be moved to production on request. The naming convention to be followed will be as described in the Naming Convention best practice in section 7.1.

3.3 MIGRATION

TO

PRODUCTION

3.3.1 Information to be provided
After coding and testing has been done in development, the following information should be provided to the Informatica Development Team so as to facilitate movement of code. This could also be true for enhancements/Bugfixes existing mappings/Sessions 1. 2. 3. 4. 5. Project name Folder in development List of session/mapping names If any scripts need to be moved then the list of the same Date when the movement has to be made

Version 1.00

Page 6 of 19

Informatica Cookbook

3.3.2 Review before movement
The Informatica Support group will review mappings and Sessions before it is moved from development to Production, following are some of the important points 1. Check if existing database connector/FTP connectors/ External loaders have been used 2. Check if the failure scripts have been added, refer to the Best Practices session for more details 3. Location of scripts/intermediate files and any datafiles 4. Location of lookup and other Caches 5. If any intermediate files are being generated as a part of the process then they should be deleted at the end of the process 6. Check if any existing code or setting can cause a known bug 7. Check if the Scheduling could effect the performance of existing sessions 8. Suggest process improvements to improve efficiency. Any project team could approach the Informatica Support Team during the initial stages of the project for process review. If the methods used to code may affect the existing systems then they will not be moved into production. 9. Restartability

3.4 CHANGES

TO AN

EXISTING PROJECT

If enhancements/Bugfixes have been made to existing mappings/Sessions, the same need to be tested in development and the following information should be provided to the Informatica Support Team 1. 2. 3. 4. 5. Project name Folder in development List of session/mapping names If any scripts need to be moved then the list of the same. Date when the movement has to be made

4.0 TRANSITION

OF

PROJECTS

FOR

SUPPORT

4.1 REQUIREMENTS

FOR SUPPORT

1. A operations document on the functionality of sessions/Mappings to be supported. 2. A re-start and recovery document explaining the actions to be taken if there is a failure for every session/batch. It would be recommended to create cleanup scripts so as to avoid unnecessary manual intervention. 3. Information on support contacts should be provided so that in case there is a need they can be contacted. The types of contacts to be provided should be a primary contact and a secondary contact.

Version 1.00

Page 7 of 19

Informatica Cookbook

4.2 SUPPORT

PROCESS ON FAILURE

1. On failure the session will send out a mail/Page to the support team. The Informatica support team shall follow the Re-start and recovery process provided. 2. A mail will be sent to the primary and secondary contacts summarizing the reason for failure and the action taken

4.3 SUPPORT WINDOW
Support for Informatica jobs will be provided between the following hours Monday to Friday OnSite Support OffShore Support 9:00AM to 6:00PM 11:00PM to 6:00PM

Saturday/Sunday and Holidays OffShore Support 11:00PM to 6:00PM

5.0 INFORMATICA ENVIRONMENTS

5.1 DEVELOPMENT
The Informatica Development Engine is setup in webstatdev. The repository is in oracle and it has been hosted in smmk94 so that backup’s of it are taken from time to time. There are development instances in version 5.1 and 6.2. Power center 5.1 Repository name – EsiteDev Repository Database Power center 6.2 Repository name – PMNEW Host Name - webstatdev Port number - 5031

5.2 PRODUCTION
The Informatica Development Engine is setup in smmk94. The repository is in oracle and it has been hosted in smmk94. There are production instances in version 5.1 and 6.2. Power center 5.1 Repository name – EsiteProd Repository Database -

Version 1.00

Page 8 of 19

Informatica Cookbook

Power center 6.2 Repository name – eSite62test Host Name - smmk94 Port number - 5031

6.0 ENGINE MANAGEMENT

6.1 MANAGING

THE

ENGINE

The development and production engines shall be managed by the Informatica support Team. Information regarding planned downtime/ upgrades shall be provided to the user community from time to time.

6.2 RESTARTING

THE ENGINE

A mail shall be sent to the user community regarding the re-starting of the engine and after the engine has been brought up this confirmation will be sent so that users can double check status of their sessions. If the sessions have not been scheduled properly the uses should inform the Informatica support team.

7.0 BEST PRACTICES

7.1 NAMING STANDARDS
7.1.1 Challenge
Define standards to be used during development in Informatica

7.1.2 Description
Folders Folders are a collection of mappings, sources, targets, sessions, and batches. Syntax: ProjectName_phase Description: Phase ‘DEV’ - Development ‘SIT’ - Integration Testing ‘UAT’ - Acceptance Testing ‘PROD’ - Production

Version 1.00

Page 9 of 19

Informatica Cookbook

ProjectName Acronym of Group Project Note: not all phases may be required by each development group. Additional folders can be created to meet the testing needs of the development teams. Ports Ports are another name for fields. There are many kinds of Ports: Input, Output, Variable, Lookup etc. Variable port names begin with the ‘v_’ prefix. Output ports that have been added during coding should begin with ‘o_’ prefix All other port names are at the discretion of the programming team. Transforms The names of these objects should describe what the transform does. Be as clear and concise as possible. Prefixes are: exp_ - Expressions jnr_ - Joiners fil_ - Filters lkp_ - Lookups agg_ - Aggregators seq_ - Sequence Generator sq_ - Source Qualifier upd_ - Update Strategy sp_ - Stored Procedure nrm_ - Normlizer rnk_ - Rank rtr_ - Router xsq_ - XML Source qualifier srt_ - Sorter Sources and Targets For databases tables, default Source and Target names are derived from the ODBC data source name and the table name/view name of the object in the DBMS. For files, default Source names are derived from FLATFILE:name of file. Mappings There are no standards for this category of object. However, it is strongly suggested NOT to use the default name. It is suggested that all mappings begin with the letter m. Sessions/Batches and workflows Sessions and Batches are the descriptive components that wrap the mappings and provide the detail regarding how, when and with what sources/targets to use during a mapping execution.

Version 1.00

Page 10 of 19

Informatica Cookbook

Syntax : Qualifier_Batch/SessionName Description: Qualifier ‘s’ for Session ‘b’ for Batch ‘wf’ for workflow ‘wl’ for worklet Free form text, usually the Mapping Name without

Batch/SessionName the prefix ‘m’.

Database Connections at the Server The PowerMart™ engine requires database connections on the machine the engine is running. In order to establish clear connection names the following standard should be used: For Oracle Connections: Syntax: database_LogonID Description: database LogonID The Oracle Schema The user id to use when logging into the source/target

Example: CAP1_powerm For Sybase Connections: Syntax: Description: Server LogonID server_database_LogonID The server name The user id to use when logging into the source/target

Example: dbp1_powerm For MS-SQLServer Connections: Syntax: Description: Database LogonID Server_Database_LogonID The Database name The user id to use when logging into the source/target

Version 1.00

Page 11 of 19

Informatica Cookbook

Example: dbp1_powerm External loader at the Server The PowerMart™ engine requires external loader on the machine the engine is running to use bulk loading utilities to load data to databases. In order to establish clear loader names the following standard should be used: For Oracle loader: Syntax: Description: Schema LogonID Example: CAP1_powerm The Oracle Schema The user id to use when logging into the source/target SQLLDR_Schema_LogonID

7.2 TEMPLATES
7.2.1 Challenge
Develop a method by which the code in Informatica can be documented so that it is easy for development and transitioning to a support team.

7.2.2 Description
A template document has been created to document the logic in the Informatica transforms. This document will be a master list of all activities to be done. One template document will be created for every mapping. The template document consists of the following sections Setup This section would contain the details of source and target, the intermediate data elements and any comments at the template level. Process over view This section would consist of the pictorial representation of the mapping for clarifying the data flow. Target to source mapping This section would have details on transformations to be done between the source and the target fields. These transformations would be mapped with respect to each target field.

Version 1.00

Page 12 of 19

Informatica Cookbook

Error handling This section would contain the error conditions and the actions to be taken for each of the error conditions. Re-start and Recovery This section would detail the restart and recovery strategy in cases of failure. Setup Setup has the following details # 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Name Mapping Name Description Source Target Initial Rows Load Frequency Load Window Pre-processor Post Processing Remarks Sources Tables Description The name of the mapping document. Any detailed description found necessary for the document. Details source for the mapping Details the target for the mapping The average number of records expected to be processed; this will be used for database size estimation and load window. The frequency of loads, this could be daily, weekly, monthly etc. The time period during which the upload will take place The activities to be done before processing the transformations. Any specific checks will have to be added here. The activities after the transformation process are complete. Any specific checks will have to be added here. Any remarks applicable at the Mapping level.

1.

2.

File

The source table name, the schema/owner name and any filter condition to be applied for the table. If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column. The source file name, the location of the file, the file type, the file format, relationship between various files and information regarding presence of header and footer.

1.

Target Tables

2.

File

The target table name, the schema/owner name If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column. The target file name, the location of the file, the

Version 1.00

Page 13 of 19

Informatica Cookbook

file type, the file format, relationship between various files and information regarding presence of header and footer. Lookups Look up name Lookup Table Table Owner Lookup Columns Filter Comments

1. 2. 3. 4. 5. 6.

The name of the lookup. The source of data The owner of the table The columns that are to be included in the lookup The condition to be applied to the data to be fetched from the table The context of usage of the lookup

Source to target mapping # 1. 2. 3. 4. 5. 6. 7. 8. Name Target Table name Target field name Target datatype Target mandatory Default value Source Table/File name Source field name Comments and transformations Description The table name of the ODS table Field name in the target field The datatype of the Target field To indicate if the field is mandatory The default value if field is null The table/file name of the source Field name in the source field The details of all transformations to be done

detailed

Error Handling Any specific error handling needs can be specified in this section of the template. Re-start and Recovery Any recovery needs of the mapping should be described in this section. If any special script needs to be run or data needs to be deleted before re-running a session it should be described here.

7.3 USAGE

OF

CONNECTION OBJECTS

7.3.1 Challenge
Define and Use connection objects like database connectors, FTP connections and external loader connections so that redundancies are eliminated and management of these objects becomes easy.

7.3.2 Description
• • • • When connecting to the database the administrative user should not be used, an application specific batch user should be used The naming convention to be followed is as specified in the naming convention section 7.1 The name of the connection object in QA and production should be the same When using the external loader, for the external loader executable name instead of using /webstatmmk1/oracle/product/9.2.0.2/bin/sqlldr use the shell script

Version 1.00

Page 14 of 19

Informatica Cookbook

/webstatmmk1/ia/pm47/sh_load sh_load_parallel_direct

or

/webstatmmk1/ia/pm47/

7.4 FAILURE SCRIPTS
7.4.1 Challenge
Develop a mechanism by which errors can be tracked and comprehended

7.4.2 Description
Implementation of the failure script Failure Scripts in Informatica 5.1 Failure Scripts in Informatica 6.2 General guidelines – from Failure perspective • • • • • • All sessions should have a failure call in the post processing If there is a requirement to call an SQL block before or after a session it is better to write it as a stored Procedure and call it than writing an SQL block It would be a good practice to call the stored procedure as a part of the mapping than calling it in a shell script Run if previous successful should be set for every session so as to avoid run away sessions. Fail parent if session fails property should be checked in every session when coding in Informatica 6.2 The limit of number of acceptable errors should always be set. It should preferably be 1000.

7.5 TRUNCATING
7.5.1 Challenge

DATA

Truncate data before loading, when an application user is being used to connect to the database.

7.5.2 Description
If existing data needs to be truncated and re-loaded then a procedure should be written in Oracle to truncate the data instead of setting the property at the target as truncate before load. By this method data can be truncated even when the

Version 1.00

Page 15 of 19

Informatica Cookbook

Informatica sessions are connecting to the database using a non DBA user. A sample of the procedure is as given under. Only batch id’s should have the access to execute this proc. This procedure can then be called from Informatica within the mapping or in the preprocessing using a shell script. PROCEDURE TruncateTable (p_tname in varchar2, p_towner in varchar2) is v_ddl_line varchar2(1000) ; begin v_ddl_line := 'truncate table '||p_towner||'.'||p_tname||'  drop  storage' ;      execute immediate v_ddl_line  ; exception     when others then     dbms_output.put_line('Error : '||to_char(SQLCODE)||' '||SQLERRM); end;

7.6 BUILT-IN
7.6.1 Challenge

RE-STARTABILITY

Design sessions such that the support and maintenance effort is low

7.6.2 Description
Sessions should be created with built in re-startability. Incase of failure it should be easy to re-start from the point of failure. Incase aggregates are being populated data should be first deleted for the period for which data is being inserted before actually inserting the data. Tasks should be broken into different sessions that calling all scripts as a part of one session. By this if a given script fails then re-starting would be easy.

Version 1.00

Page 16 of 19

Informatica Cookbook

7.7 PROJECT
7.7.1 Challenge

DIRECTORY STRUCTURE IN

UNIX

Define a standard for organization of directories in Unix

7.7.2 Description
All examples are for a project named sample. Following directories should be created inside the home directory for each project • Bin – Directory for all the scripts used in the project (E.g. /webstatmmk1/post/sample/bin) • Env – Directory for parameter and environment settings files(E.g. /webstatmmk1/post/sample/env) • Incoming – Directory where the files that act as the source for the project should reside (E.g. /webstatmmk1/post/sample/incoming) • Outgoing – Directory where the output files created by various processes should reside (E.g. /webstatmmk1/post/sample/outgoing) • Temp – Directory for temporary files created by various processes, the bad files and lookup cache files created by Informatica should also reside in this directory (E.g. /webstatmmk1/post/sample/temp) • Log – Directory for the log files generated by various processes in the project. The Informatica log files should be saved into this directory (E.g. /webstatmmk1/post/sample/log) • Archive – Directory for storing files that need to be archived as a part of the project (E.g. /webstatmmk1/post/sample/archive) The Directory where the log files are stored should be added to the script in the crontab that checks for the # of errors and warnings in Informatica log files so that it would become easy to track sessions with many errors/warnings.

7.8 PARAMETERIZATION
7.8.1 Challenge

OF SESSION INFORMATION

Session information should be parameterized as far as possible so that migration of code between dev/qa and production can be done with minimum changes. The log files/bad files target files etc can be separated for each application so that they don’t affect each other.

7.8.2 Description
The session information that can be parameterized is Srl. # 1. 2. 3. Session Information Session log file Directory/name Source database connector Source file directory/name

Version 1.00

Page 17 of 19

Informatica Cookbook

4. 5. 6. 7. 8.

Target database connector Target file directory/name Reject File directory/name $Source connection value in the properties tab $Target connection value in the properties tab

A sample parameter file [SAMPLE.s_m_first_sample_session] $PMSessionLogFile=/webstatmmk1/post/sample/log/ s_m_first_sample_session.log $DBConnection_sample_source=sample_source $DBConnection_sample_target=sample_target $RejectFile_sample=/webstatmmk1/post/sample/temp $TargetFileDir_test=/webstatmmk1/post/sample/outgoing $SrcFileDir_test=/webstatmmk1/post/sample/incoming Parameter file header The header should be FolderName.SessionName, the folder name is not required but it is advised to add the same. Session log file Directory/name The session log file name and directory can be parameterized, if only the file name needs to be parameterized then the property “Session Log File Name” needs to set to $PMSessionLogFile. If the log file name and directory needs to be parameterized then the property “Session Log File directory“ should be left blank and then the property “Session Log File Name” should be set to $PMSessionLogFile. Database connection The source and target database connection information can be parameterized. Source/Target/reject File Directory/Name The Source/Target or Reject file names can be parameterized. If only the file name needs to be changed to $TargetFileDir_test and the value for the parameter can be set to a different file name. If the file as well as the directory needs to be changed then the property “Output file directory” should be left blank and in the file name should be populated as $TargetFileDir_test. Session Information that cannot be parameterized using a value in the parameter file 1. Information in the transformation tab a. Lookup and Stored proc connection information i. The $Source and $Target that is defined in the properties tab can be used for the lookup and the stored proc connection

Version 1.00

Page 18 of 19

Informatica Cookbook

information b. Cache file location i. Unix soft links should be used so that the same string can be used in Development/QA and Production 2. Parameter Filename in the properties tab, an exception being if the session is being scheduled by pmcmd. When using pmcmd the parameter file name is taken as an input parameter.

Version 1.00

Page 19 of 19