You are on page 1of 19

INFORMATICA COOKBOOK

INFORMATICA DEVELOPER’S GUIDE

Author : Sastry Kolluru


Creation Date :
Last Modified :
Version : 1.00

Approvals
Stephen Musgrove :
:
Informatica Cookbook

Change Record
DATE Author Version Reference
19-Apr-2004 Sastry Kolluru 1.00 Added section 7.7 and
7.8

Reviewers
NAME POSITION

Version 1.00 Page 2 of 19


Informatica Cookbook

Table of Contents

1.0OVERVIEW...............................................................................................................................................5
2.0GETTING STARTED...............................................................................................................................5
2.1 ABOUT INFORMATICA...................................................................................................................................5
2.1.1 Version in use..................................................................................................................................5
3.0INFORMATICA DEVELOPMENT CYCLE.........................................................................................5
3.1 STARTING A NEW PROJECT...........................................................................................................................5
3.1.1 Project Initialization........................................................................................................................5
3.1.2 Login................................................................................................................................................6
3.1.3 Folders and Groups setup...............................................................................................................6
3.2 DEVELOPMENT AND TESTING PROCESS...........................................................................................................6
3.3 MIGRATION TO PRODUCTION.........................................................................................................................6
3.3.1 Information to be provided..............................................................................................................6
3.3.2 Review before movement.................................................................................................................7
3.4 CHANGES TO AN EXISTING PROJECT...............................................................................................................7
4.0 TRANSITION OF PROJECTS FOR SUPPORT..................................................................................7
4.1 REQUIREMENTS FOR SUPPORT........................................................................................................................7
4.2 SUPPORT PROCESS ON FAILURE.......................................................................................................................8
4.3 SUPPORT WINDOW......................................................................................................................................8
5.0INFORMATICA ENVIRONMENTS......................................................................................................8
5.1 DEVELOPMENT............................................................................................................................................8
5.2 PRODUCTION...............................................................................................................................................8
6.0 ENGINE MANAGEMENT.....................................................................................................................9
6.1 MANAGING THE ENGINE...............................................................................................................................9
6.2 RESTARTING THE ENGINE..............................................................................................................................9
7.0 BEST PRACTICES..................................................................................................................................9
7.1 NAMING STANDARDS...................................................................................................................................9
7.1.1 Challenge........................................................................................................................................9
7.1.2 Description......................................................................................................................................9
7.2 TEMPLATES...............................................................................................................................................12
7.2.1 Challenge......................................................................................................................................12
7.2.2 Description....................................................................................................................................12
7.3 USAGE OF CONNECTION OBJECTS................................................................................................................14
7.3.1 Challenge......................................................................................................................................14
7.3.2 Description....................................................................................................................................14
7.4 FAILURE SCRIPTS......................................................................................................................................15
7.4.1 Challenge......................................................................................................................................15
7.4.2 Description....................................................................................................................................15
7.5 TRUNCATING DATA.....................................................................................................................................15
7.5.1 Challenge......................................................................................................................................15
7.5.2 Description....................................................................................................................................15
7.6 BUILT-IN RE-STARTABILITY..........................................................................................................................16
7.6.1 Challenge......................................................................................................................................16
7.6.2 Description....................................................................................................................................16
7.7 PROJECT DIRECTORY STRUCTURE IN UNIX......................................................................................................17
7.7.1 Challenge......................................................................................................................................17
7.7.2 Description....................................................................................................................................17
7.8 PARAMETERIZATION OF SESSION INFORMATION................................................................................................17

Version 1.00 Page 3 of 19


Informatica Cookbook

7.8.1 Challenge .....................................................................................................................................17


7.8.2 Description....................................................................................................................................17

Version 1.00 Page 4 of 19


Informatica Cookbook

1.0 OVERVIEW
The objective of the Informatica Cookbook is to provide the Informatica user
community at Fidelity Investments information regarding
• Informatica infrastructure at FEB
• Processes for the development life cycle
• Best practices/ tips and techniques

The cookbook hopes to be a starting point for developers so that they can
understand standards/processes and best practices before starting work on the
FEB Informatica infrastructure. It also will act as a refresher for experienced
developers for best practices and learning’s from other users.

We hope to update this document on a regular basis to incorporate better


practices and learning’s.

2.0 GETTING STARTED

2.1 ABOUT INFORMATICA


Informatica PowerCenter is a data integration platform for building, deploying,
and managing enterprise data warehouses, and other data integration projects.
Informatica PowerCenter enables users to easily transform data from disparate
enterprise systems and sources into reliable information to support strategic
business initiatives.

2.1.1 Version in use


The versions of Informatica currently running are 5.1 and 6.2. All new
developments should be done in version 6.2. All projects currently in Informatica
5.1 will be migrated to Informatica 6.2

3.0 INFORMATICA DEVELOPMENT CYCLE

3.1 STARTING A NEW PROJECT


3.1.1 Project Initialization
A mail has to be sent to the Informatica Support team before the start of any
project. The mail should contain the following information.
1. Project Name
2. Project Contact
3. Folder Name

Version 1.00 Page 5 of 19


Informatica Cookbook

4. List of users accessing the folder


5. Informatica version planned to be used
6. Expected date of moving to Production
7. Expected number of sessions/mappings in the project

A minimum of 5 days notice should be given for code to be moved to production


to help plan the same.

3.1.2 Login
Every user should have a login into development as well as production. The Corp
id will be used as login for individual users. In development users will be given
access to both create and execute mappings/sessions whereas in production only
read access will be given. The request for creating a new login may come as a
part of the project initialization mail or a separate mail maybe sent to the
Informatica Support Group. A selective execute privilege can be requested for
some sessions or workflows.

3.1.3 Folders and Groups setup


Folders will be created based on the information provided to the Informatica
Support Group as a part of the project initialization process. Groups will be setup
in Informatica to manage access of users to various folders.

3.2 DEVELOPMENT AND TESTING PROCESS


All development should be done in the development instance of Informatica and
Oracle. Separate folders should be created in the same development repository
for development, QA and SIT.

The folders marked as <folder_name>_prod will be moved to production on


request. The naming convention to be followed will be as described in the Naming
Convention best practice in section 7.1.

3.3 MIGRATION TO PRODUCTION

3.3.1 Information to be provided

After coding and testing has been done in development, the following information
should be provided to the Informatica Development Team so as to facilitate
movement of code. This could also be true for enhancements/Bugfixes existing
mappings/Sessions

1. Project name
2. Folder in development
3. List of session/mapping names
4. If any scripts need to be moved then the list of the same
5. Date when the movement has to be made

Version 1.00 Page 6 of 19


Informatica Cookbook

3.3.2 Review before movement

The Informatica Support group will review mappings and Sessions before it is
moved from development to Production, following are some of the important
points

1. Check if existing database connector/FTP connectors/ External loaders have


been used
2. Check if the failure scripts have been added, refer to the Best Practices
session for more details
3. Location of scripts/intermediate files and any datafiles
4. Location of lookup and other Caches
5. If any intermediate files are being generated as a part of the process then
they should be deleted at the end of the process
6. Check if any existing code or setting can cause a known bug
7. Check if the Scheduling could effect the performance of existing sessions
8. Suggest process improvements to improve efficiency. Any project team could
approach the Informatica Support Team during the initial stages of the project
for process review. If the methods used to code may affect the existing
systems then they will not be moved into production.
9. Restartability

3.4 CHANGES TO AN EXISTING PROJECT


If enhancements/Bugfixes have been made to existing mappings/Sessions, the
same need to be tested in development and the following information should be
provided to the Informatica Support Team

1. Project name
2. Folder in development
3. List of session/mapping names
4. If any scripts need to be moved then the list of the same.
5. Date when the movement has to be made

4.0 TRANSITION OF PROJECTS FOR SUPPORT

4.1 REQUIREMENTS FOR SUPPORT

1. A operations document on the functionality of sessions/Mappings to be


supported.
2. A re-start and recovery document explaining the actions to be taken if there is a
failure for every session/batch. It would be recommended to create cleanup
scripts so as to avoid unnecessary manual intervention.
3. Information on support contacts should be provided so that in case there is a
need they can be contacted. The types of contacts to be provided should be a
primary contact and a secondary contact.

Version 1.00 Page 7 of 19


Informatica Cookbook

4.2 SUPPORT PROCESS ON FAILURE

1. On failure the session will send out a mail/Page to the support team. The
Informatica support team shall follow the Re-start and recovery process provided.
2. A mail will be sent to the primary and secondary contacts summarizing the
reason for failure and the action taken

4.3 SUPPORT WINDOW


Support for Informatica jobs will be provided between the following hours

Monday to Friday

OnSite Support - 9:00AM to 6:00PM


OffShore Support - 11:00PM to 6:00PM

Saturday/Sunday and Holidays


OffShore Support - 11:00PM to 6:00PM

5.0 INFORMATICA ENVIRONMENTS

5.1 DEVELOPMENT
The Informatica Development Engine is setup in webstatdev. The repository is in
oracle and it has been hosted in smmk94 so that backup’s of it are taken from
time to time. There are development instances in version 5.1 and 6.2.

Power center 5.1


Repository name – EsiteDev
Repository Database -

Power center 6.2


Repository name – PMNEW
Host Name - webstatdev
Port number - 5031

5.2 PRODUCTION
The Informatica Development Engine is setup in smmk94. The repository is in
oracle and it has been hosted in smmk94. There are production instances in
version 5.1 and 6.2.

Power center 5.1


Repository name – EsiteProd
Repository Database -

Version 1.00 Page 8 of 19


Informatica Cookbook

Power center 6.2


Repository name – eSite62test
Host Name - smmk94
Port number - 5031

6.0 ENGINE MANAGEMENT

6.1 MANAGING THE ENGINE


The development and production engines shall be managed by the Informatica
support Team. Information regarding planned downtime/ upgrades shall be
provided to the user community from time to time.

6.2 RESTARTING THE ENGINE

A mail shall be sent to the user community regarding the re-starting of the
engine and after the engine has been brought up this confirmation will be sent so
that users can double check status of their sessions. If the sessions have not
been scheduled properly the uses should inform the Informatica support team.

7.0 BEST PRACTICES

7.1 NAMING STANDARDS


7.1.1 Challenge
Define standards to be used during development in Informatica

7.1.2 Description

Folders
Folders are a collection of mappings, sources, targets, sessions, and batches.

Syntax:
ProjectName_phase

Description:

Phase ‘DEV’ - Development


‘SIT’ - Integration Testing
‘UAT’ - Acceptance Testing
‘PROD’ - Production

Version 1.00 Page 9 of 19


Informatica Cookbook

ProjectName Acronym of Group Project

Note: not all phases may be required by each development group. Additional
folders can be created to meet the testing needs of the development teams.

Ports
Ports are another name for fields. There are many kinds of Ports: Input, Output,
Variable, Lookup etc.

Variable port names begin with the ‘v_’ prefix. Output ports that have been
added during coding should begin with ‘o_’ prefix

All other port names are at the discretion of the programming team.

Transforms
The names of these objects should describe what the transform does. Be as clear
and concise as possible. Prefixes are:
exp_ - Expressions
jnr_ - Joiners
fil_ - Filters
lkp_ - Lookups
agg_ - Aggregators
seq_ - Sequence Generator
sq_ - Source Qualifier
upd_ - Update Strategy
sp_ - Stored Procedure
nrm_ - Normlizer
rnk_ - Rank
rtr_ - Router
xsq_ - XML Source qualifier
srt_ - Sorter

Sources and Targets

For databases tables, default Source and Target names are derived from the
ODBC data source name and the table name/view name of the object in the
DBMS.

For files, default Source names are derived from FLATFILE:name of file.

Mappings

There are no standards for this category of object. However, it is strongly


suggested NOT to use the default name. It is suggested that all mappings begin
with the letter m.

Sessions/Batches and workflows

Sessions and Batches are the descriptive components that wrap the mappings
and provide the detail regarding how, when and with what sources/targets to use
during a mapping execution.

Version 1.00 Page 10 of 19


Informatica Cookbook

Syntax :
Qualifier_Batch/SessionName

Description:
Qualifier - ‘s’ for Session
‘b’ for Batch
‘wf’ for workflow
‘wl’ for worklet

Batch/SessionName - Free form text, usually the Mapping Name without


the prefix ‘m’.

Database Connections at the Server

The PowerMart™ engine requires database connections on the machine the


engine is running. In order to establish clear connection names the following
standard should be used:

For Oracle Connections:

Syntax:
database_LogonID

Description:

database - The Oracle Schema


LogonID - The user id to use when logging into the source/target

Example:
CAP1_powerm

For Sybase Connections:

Syntax:
server_database_LogonID

Description:
Server - The server name
LogonID - The user id to use when logging into the source/target

Example:
dbp1_powerm

For MS-SQLServer Connections:

Syntax:
Server_Database_LogonID

Description:
Database - The Database name
LogonID - The user id to use when logging into the source/target

Version 1.00 Page 11 of 19


Informatica Cookbook

Example:
dbp1_powerm

External loader at the Server

The PowerMart™ engine requires external loader on the machine the engine is
running to use bulk loading utilities to load data to databases. In order to
establish clear loader names the following standard should be used:

For Oracle loader:

Syntax:
SQLLDR_Schema_LogonID

Description:

Schema - The Oracle Schema


LogonID - The user id to use when logging into the source/target

Example:
CAP1_powerm

7.2 TEMPLATES
7.2.1 Challenge
Develop a method by which the code in Informatica can be documented so that it
is easy for development and transitioning to a support team.

7.2.2 Description

A template document has been created to document the logic in the Informatica
transforms. This document will be a master list of all activities to be done. One
template document will be created for every mapping. The template document
consists of the following sections

Setup
This section would contain the details of source and target, the intermediate data
elements and any comments at the template level.

Process over view


This section would consist of the pictorial representation of the mapping for
clarifying the data flow.

Target to source mapping


This section would have details on transformations to be done between the
source and the target fields. These transformations would be mapped with
respect to each target field.

Version 1.00 Page 12 of 19


Informatica Cookbook

Error handling
This section would contain the error conditions and the actions to be taken for
each of the error conditions.

Re-start and Recovery


This section would detail the restart and recovery strategy in cases of failure.

Setup

Setup has the following details

# Name Description
1. Mapping Name The name of the mapping document.
2. Description Any detailed description found necessary for the
document.
3. Source Details source for the mapping
4. Target Details the target for the mapping
5. Initial Rows The average number of records expected to be
processed; this will be used for database size
estimation and load window.
6. Load Frequency The frequency of loads, this could be daily,
weekly, monthly etc.
7. Load Window The time period during which the upload will
take place
8. Pre-processor The activities to be done before processing the
transformations. Any specific checks will have to
be added here.
9. Post Processing The activities after the transformation process
are complete. Any specific checks will have to be
added here.
10. Remarks Any remarks applicable at the Mapping level.

Sources
1. Tables The source table name, the schema/owner
name and any filter condition to be applied for
the table. If multiple tables are present then all
the table names will have to be added. The
relationship between the tables will be provided
in the relationship column.
2. File The source file name, the location of the file, the
file type, the file format, relationship between
various files and information regarding presence
of header and footer.

Target
1. Tables The target table name, the schema/owner name
If multiple tables are present then all the table
names will have to be added. The relationship
between the tables will be provided in the
relationship column.
2. File The target file name, the location of the file, the

Version 1.00 Page 13 of 19


Informatica Cookbook

file type, the file format, relationship between


various files and information regarding presence
of header and footer.

Lookups
1. Look up name The name of the lookup.
2. Lookup Table The source of data
3. Table Owner The owner of the table
4. Lookup Columns The columns that are to be included in the
lookup
5. Filter The condition to be applied to the data to be
fetched from the table
6. Comments The context of usage of the lookup

Source to target mapping

# Name Description
1. Target Table name The table name of the ODS table
2. Target field name Field name in the target field
3. Target datatype The datatype of the Target field
4. Target mandatory To indicate if the field is mandatory
5. Default value The default value if field is null
6. Source Table/File name The table/file name of the source
7. Source field name Field name in the source field
8. Comments and detailed The details of all transformations to be done
transformations

Error Handling
Any specific error handling needs can be specified in this section of the template.

Re-start and Recovery


Any recovery needs of the mapping should be described in this section. If any special
script needs to be run or data needs to be deleted before re-running a session it
should be described here.

7.3 USAGE OF CONNECTION OBJECTS


7.3.1 Challenge
Define and Use connection objects like database connectors, FTP connections and
external loader connections so that redundancies are eliminated and
management of these objects becomes easy.

7.3.2 Description
• When connecting to the database the administrative user should not be used, an
application specific batch user should be used
• The naming convention to be followed is as specified in the naming convention
section 7.1
• The name of the connection object in QA and production should be the same
• When using the external loader, for the external loader executable name instead
of using /webstatmmk1/oracle/product/9.2.0.2/bin/sqlldr use the shell script

Version 1.00 Page 14 of 19


Informatica Cookbook

/webstatmmk1/ia/pm47/sh_load or /webstatmmk1/ia/pm47/
sh_load_parallel_direct

7.4 FAILURE SCRIPTS


7.4.1 Challenge

Develop a mechanism by which errors can be tracked and comprehended

7.4.2 Description

Implementation of the failure script

Failure Scripts in Informatica 5.1

Failure Scripts in Informatica 6.2

General guidelines – from Failure perspective

• All sessions should have a failure call in the post processing


• If there is a requirement to call an SQL block before or after a session it is better
to write it as a stored Procedure and call it than writing an SQL block
• It would be a good practice to call the stored procedure as a part of the mapping
than calling it in a shell script
• Run if previous successful should be set for every session so as to avoid run away
sessions.
• Fail parent if session fails property should be checked in every session when
coding in Informatica 6.2
• The limit of number of acceptable errors should always be set. It should
preferably be 1000.

7.5 TRUNCATING DATA

7.5.1 Challenge

Truncate data before loading, when an application user is being used to connect
to the database.

7.5.2 Description

If existing data needs to be truncated and re-loaded then a procedure should be


written in Oracle to truncate the data instead of setting the property at the target
as truncate before load. By this method data can be truncated even when the

Version 1.00 Page 15 of 19


Informatica Cookbook

Informatica sessions are connecting to the database using a non DBA user. A
sample of the procedure is as given under. Only batch id’s should have the access
to execute this proc.

This procedure can then be called from Informatica within the mapping or in the
preprocessing using a shell script.

PROCEDURE TruncateTable (p_tname in varchar2, p_towner in varchar2)


is
v_ddl_line varchar2(1000) ;
begin
v_ddl_line := 'truncate table '||p_towner||'.'||p_tname||'  drop 
storage' ;
    
execute immediate v_ddl_line  ;
exception
    when others then
    dbms_output.put_line('Error : '||to_char(SQLCODE)||' '||SQLERRM);
end;

7.6 BUILT-IN RE-STARTABILITY

7.6.1 Challenge
Design sessions such that the support and maintenance effort is low

7.6.2 Description

Sessions should be created with built in re-startability. Incase of failure it should


be easy to re-start from the point of failure.

Incase aggregates are being populated data should be first deleted for the period
for which data is being inserted before actually inserting the data.

Tasks should be broken into different sessions that calling all scripts as a part of
one session. By this if a given script fails then re-starting would be easy.

Version 1.00 Page 16 of 19


Informatica Cookbook

7.7 PROJECT DIRECTORY STRUCTURE IN UNIX


7.7.1 Challenge
Define a standard for organization of directories in Unix

7.7.2 Description

All examples are for a project named sample.

Following directories should be created inside the home directory for each project
• Bin – Directory for all the scripts used in the project (E.g.
/webstatmmk1/post/sample/bin)
• Env – Directory for parameter and environment settings files(E.g.
/webstatmmk1/post/sample/env)
• Incoming – Directory where the files that act as the source for the
project should reside (E.g. /webstatmmk1/post/sample/incoming)
• Outgoing – Directory where the output files created by various processes
should reside (E.g. /webstatmmk1/post/sample/outgoing)
• Temp – Directory for temporary files created by various processes, the
bad files and lookup cache files created by Informatica should also reside
in this directory (E.g. /webstatmmk1/post/sample/temp)
• Log – Directory for the log files generated by various processes in the
project. The Informatica log files should be saved into this directory (E.g.
/webstatmmk1/post/sample/log)
• Archive – Directory for storing files that need to be archived as a part of
the project (E.g. /webstatmmk1/post/sample/archive)

The Directory where the log files are stored should be added to the script in the
crontab that checks for the # of errors and warnings in Informatica log files so
that it would become easy to track sessions with many errors/warnings.

7.8 PARAMETERIZATION OF SESSION INFORMATION

7.8.1 Challenge
Session information should be parameterized as far as possible so that migration
of code between dev/qa and production can be done with minimum changes. The
log files/bad files target files etc can be separated for each application so that
they don’t affect each other.

7.8.2 Description

The session information that can be parameterized is

Srl. # Session Information


1. Session log file Directory/name
2. Source database connector
3. Source file directory/name

Version 1.00 Page 17 of 19


Informatica Cookbook

4. Target database connector


5. Target file directory/name
6. Reject File directory/name
7. $Source connection value in the
properties tab
8. $Target connection value in the
properties tab

A sample parameter file

[SAMPLE.s_m_first_sample_session]
$PMSessionLogFile=/webstatmmk1/post/sample/log/
s_m_first_sample_session.log
$DBConnection_sample_source=sample_source
$DBConnection_sample_target=sample_target
$RejectFile_sample=/webstatmmk1/post/sample/temp
$TargetFileDir_test=/webstatmmk1/post/sample/outgoing
$SrcFileDir_test=/webstatmmk1/post/sample/incoming

Parameter file header

The header should be FolderName.SessionName, the folder name is not required but
it is advised to add the same.

Session log file Directory/name

The session log file name and directory can be parameterized, if only the file name
needs to be parameterized then the property “Session Log File Name” needs to set to
$PMSessionLogFile. If the log file name and directory needs to be parameterized then
the property “Session Log File directory“ should be left blank and then the property
“Session Log File Name” should be set to $PMSessionLogFile.

Database connection

The source and target database connection information can be parameterized.

Source/Target/reject File Directory/Name

The Source/Target or Reject file names can be parameterized. If only the file name
needs to be changed to $TargetFileDir_test and the value for the parameter can be
set to a different file name. If the file as well as the directory needs to be changed
then the property “Output file directory” should be left blank and in the file name
should be populated as $TargetFileDir_test.

Session Information that cannot be parameterized using a value in the


parameter file

1. Information in the transformation tab


a. Lookup and Stored proc connection information
i. The $Source and $Target that is defined in the properties tab
can be used for the lookup and the stored proc connection

Version 1.00 Page 18 of 19


Informatica Cookbook

information
b. Cache file location
i. Unix soft links should be used so that the same string can be
used in Development/QA and Production
2. Parameter Filename in the properties tab, an exception being if the session is
being scheduled by pmcmd. When using pmcmd the parameter file name is
taken as an input parameter.

Version 1.00 Page 19 of 19