You are on page 1of 271

Informatica PowerCenter

Course Objectives
At the end of this course you will be able to: Understand how to use all major PowerCenter components Build basic ETL mappings Create and run Workflows Perform basic repository administration tasks Troubleshoot problems

Agenda
Duration 1.5 hrs per day, 4 Weeks training course Version 8.6

Extract, Transform, and Load


Operational Systems
RDBMS Mainframe Other

Decision Support
Data Warehouse

Transaction level data Normalized or DeNormalized data

Aggregate Data Cleanse Data Consolidate Data Apply Business Rules De-normalize

Aggregated data Historical

Transform ETL Load


4

Extract

Informatica as ETL

Power Center Architecture


Node a representation of a server running the PowerCenter Server software. Domain a collection of Repository Services. Domains can run on one or more Nodes, and can utilize Load Balancing if this has been licensed. Repository Service this is synonymous with a repository, in that each service only manages one repository. All communication with the Repository (e.g. from the Designer, or when running workflows) is managed by the Repository Service. Integration Service manages the running of workflows and sessions Reporting Service an optional service that provides quick reports on source/ target dependencies etc. Repository a set of tables in a relational database representing one or more folders. Each folder when viewed through Designer contains source and target definitions, mappings, mapplets. When viewed through Workflow Manager each folder contains tasks, workflows, worklets and sessions.
6

Power Center Architecture

Installation for Informatica


Domain Objects
Integration Service Repository Service Repository Service Process Node

Client Objects
Designer Repository Manager Workflow Manager Workflow Monitor
8

Client Tools- Overview


Designer Used for source ,target definitions Mapping /Mapplet & Transformations developments Repository Manager Used for folder, security and version management Workflow Manager Design sessions, workflows ,database connections, servers and other run time requirements for execution Workflow Monitor Monitoring progress of execution through GUI

Informatica Repository
The Informatica repository is a set of tables that store the metadata created using the Informatica Client tools. Metadata is added to the repository tables when we perform tasks in the Informatica Client application such as developing mappings or creating sessions. The Workflow Manager adds metadata to the repository tables in the form of tasks and workflows. The Integration Service creates metadata in the repository such as start and finish times of tasks as well as workflow status.
10

PowerCenter Administration Console


Administration Console is a web application that we use to manage a PowerCenter domain. If you have a user login to the domain, we can access the Administration Consol to perform administrative tasks such as managing logs, user accounts, and domain objects. Domain objects include services, nodes, and licenses. Administration Console perform the following tasks in the domain.

11

Integration Service
The Integration Service reads mapping and session information from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules that are configured in the mapping. The Integration Service loads the transformed data into the mapping targets.

12

Repository Service
The Repository Service is an application service that manages the repository. It retrieves, inserts, and updates metadata in the repository database tables. Select a Repository Service in the Navigator to access information about the service.
Integration Service

Repository Service

Repository Manager

Repository Agent(s)

Repository
13

Repository Service Process


A service process is the physical representation of a service running on a node. The process for a Repository Service is the pmrepagent process. At any given time, only one service process is running for the service in the domain.

14

Installations Required
Oracle : 10G or any Relational DB PL/SQL Developer or Toad Informatica 8x

15

Lab
Install Repository Creation

16

Installation and Configuration


Informatica Installation and Configuration Steps
Four Steps Process for Installing Informatica
Install Informatica Client, Repository Server, Informatica Server and ODBC Driver Configure Repository Server

Configure Informatica Server

Register Informatica Server with Repository

17

The Design Process

18

Design Process
1. Create Source definition(s) 2. Create Target definition(s) 3. Create a Mapping 4. Create a Session Task 5. Create a Workflow from Task components 6. Run the Workflow and verify the results

19

Source Object Definitions

20

Source Object Definitions


By the end of this section you will be able to: Understand Source definition properties Create Source definitions Use the Data Preview option

21

Methods of Analyzing Sources


Import from Database Import from File Import from Cobol File Import from XML file Create manually
Repository

Source Analyzer

Relational

XML file

Flat file

COBOL file
22

Analyzing Relational Sources


Source Analyzer ODBC Relational Source
Table View Synonym
DEF Repository Server TCP/IP

Repository Agent

native
Repository
23

DEF

Analyzing Relational Sources


Editing Source Definition Properties

24

Analyzing Flat File Sources


Source Analyzer
Mapped Drive Local Directory

Flat File
DEF

Fixed Width or Delimited


Repository Server TCP/IP

Repository Agent

native
Repository
25

DEF

Flat File Wizard


Three-step wizard Columns can be renamed within wizard Text, Numeric and Datetime datatypes are supported Wizard guesses datatype

26

Flat File Source Properties

27

Data Previewer
Preview data in
Relational Sources Flat File Sources Relational Targets Flat File Targets

Data Preview Option is available in


Source Analyzer Warehouse Designer Mapping Designer Mapplet Designer

28

Data Previewer - Source Analyzer


Data Preview Example From Source Analyzer, Select Source drop down menu, then Preview Data

Enter connection information in the dialog box


A right mouse click on the object can also be used to preview data 29

Data Preview - Source Analyzer


Data Preview Example (continued) Data Display

30

Lab - Analyze Source Data

31

Target Object Definitions

32

Target Object Definitions


By the end of this section you will be able to: Understand Target definition properties Know the supported methods of creating Target definitions

33

Creating Target Definitions


Methods of creating target definitions Automatic Creation Import from Database Import from an XML file Manual Creation

34

Automatic Target Creation


Drag (a Source definition) and drop (in the Warehouse Designer workspace)

35

Import Definition from Database


Can Reverse engineer existing object definitions from a database system catalog or data dictionary Warehouse Designer Database
Table View Synonym

Repository Server TCP/IP Repository Agent

DEF

native
DEF Repository
36

Manual Target Creation


1. Create empty definition 2. Add desired columns

3. Finished target definition

ALT-F can also be used to create a new field

37

Target Definition Properties

38

Target Definition Properties

39

Creating Physical Tables

DEF

DEF

DEF

Execute SQL via Designer Target database physical tables

Repository logical target table definitions

40

Creating Physical Tables


Create tables that do not already exist in target database
Connect - connect to the target database Generate SQL file - create DDL in a script file Edit SQL file - modify DDL script as needed Execute SQL file - create physical tables in target database

Use Preview Data to verify the results (right mouse click on object)

41

Lab Import Target Schema

42

Transformation Concepts

43

Transformation Concepts
By the end of this section you will be familiar with: Transformation types and views Transformation properties Informatica data types The Expression transformation The Filter transformation

44

What is a transformation? A Transformation is any part of mapping that generates or modifies Data . Transformation Types
TRANSFORMATIONS

Transformation Types

ACTIVE

PASSIVE

45

Transformations
Informatica PowerCenter provides following objects for data transformation: Source Qualifier: reads data from Flat File and Relational Sources XML Source Qualifier: reads XML data ERP Source Qualifier: reads ERP object sources Normalizer: reorganizes records from Relational and Flat File Expression: performs row-level calculations Aggregator: performs aggregate calculations Filter: drops rows conditionally Router: splits rows conditionally Sorter: sorts data
46

Transformations
Transformations objects (continued) Update Strategy: tags rows for insert, update, delete, reject Lookup: looks up values and passes them to other objects Joiner: joins heterogeneous sources Stored Procedure: calls a database stored procedure Union Transformation: Union all for two sources External Procedure (TX): calls compiled code for each row Sequence Generator: generates unique ID values Rank: limits records to a top or bottom range

47

Port
Port represents a single column of data Every transformation object definition contains a collection of ports Each port can be defined as an

Input Port (which receives data, or a data sink) Output Port (which provides data, or a data source) Input/Output port (which passes data through, or a data sink and source) Variable Port (used to store intermediate data values) Lookup Port (Used under Lookup Transformation) Return Port (Used under Lookup Transformation)

48

Port - Sample

Input Ports

49

Transformation Views
A transformation has three views:
Iconized View - shows the transformation in relation to the rest of the mapping Normal View - shows the data flow through the transformation Edit View - shows the transformation ports and properties. Allows editing

50

Normal View
Shows data flow through the transformation
Data passes through I/O ports unchanged

DATE_ENTERED passes in to the transformation through an input port It is used in the MONTH port to extract the month The month is passed through MONTH output port
51

Edit Mode
Allows users with write privileges to change or create transformation properties
Define port level handling Enter comments Make reusable Switch between transformations
Default value for the selected port

Define transformation level properties

52

For input and input/output ports, default values are used to replace null values For output ports, default values are used to handle transformation errors (not null handling)

Default Values

Selected port Default value for the selected port

Validate the default value expression

53

Informatica Data Types


NATIVE DATATYPES TRANSFORMATION DATATYPES

Specific to the source and target database types Display in source and target tables within Mapping Designer

PowerCenter internal datatypes based on ANSI SQL-92 Display in transformations within Mapping Designer

Native

Transformation

Native

Transformation datatypes allow mix and match of source and target database types When connecting ports, native and transformation datatypes must be compatible (or they must be converted them)

54

Datatype Conversions
Integer Integer Decimal Double Char Date Raw X X X X Decimal X X X X Double X X X X Char X X X X X Date Raw

X X X

All numeric data can be converted to all other numeric datatypes, i.e., integer, double, and decimal All numeric data can be converted to string and vice versa Date can be converted only to date and string and vice versa Raw (binary) can only be converted to raw Other conversions not listed above are not supported These conversions are implicit; no function is necessary
55

Expression Transformation
Perform calculations using non-aggregate functions (row level)
Passive Transformation Connected Mode Only Ports Mixed Variables allowed Create expression in an output or variable port Usage Perform majority of data manipulation
56

Click here to invoke the Expression Editor

Expression Editor
An expression is a calculation or conditional statement Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy Performs calculation based on ports, functions, operators, variables, literals, constants, and return values from other transformations

57

Expression Validation
The Validate or OK button in the Expression Editor will Parse the current expression
remote port searching (resolves references to ports in other transformations)

Parse transformation attributes


such as filter condition, lookup condition, SQL Query, etc.

Parse default values Check spelling, correct number of arguments in functions, other syntactical errors

58

Informatica Functions
ASCII CHR CHRCODE CONCAT INITCAP INSTR LENGTH LOWER LPAD LTRIM RPAD RTRIM SUBSTR UPPER REPLACESTR REPLACECHR

Character Functions Used to manipulate character data CHRCODE returns the numeric value (ASCII or Unicode) of the first character of the string passed to this function

For backwards compatibility only - use || instead

59

Informatica Functions
TO_CHAR (numeric) TO_DATE TO_DECIMAL TO_FLOAT TO_INTEGER TO_NUMBER ADD_TO_DATE DATE_COMPARE DATE_DIFF GET_DATE_PART LAST_DAY ROUND (date) SET_DATE_PART TO_CHAR (date) TRUNC (date)

Conversion Functions Used to convert datatypes

Date Functions Used to round, truncate, or compare dates; extract one part of a date; or perform arithmetic on a date To pass a string to a date function, first use the TO_DATE function to convert it to an date/time datatype
60

Informatica Functions
ABS CEIL CUME EXP FLOOR LN LOG MOD MOVINGAVG MOVINGSUM POWER ROUND SIGN SQRT TRUNC

Numerical Functions Used to perform mathematical operations on numeric data Scientific Functions Used to calculate geometric values of numeric data

COS COSH SIN SINH TAN TANH

61

System Variables
$$$SessStartTime

Returns the system date value as a string. Uses system clock on machine hosting Informatica server
format of the string is database type dependent Used in SQL override Has a constant value

SYSDATE

Provides current datetime on the Informatica server machine


Not a static value

62

Mappings

63

Mappings
By the end of this section you will be familiar with: Mapping components Source Qualifier transformation Mapping validation Data flow rules Mapping parameters and variables

64

Mapping Designer

Transformation Toolbar Mapping List Iconized Mapping

65

Source Qualifier Transformation


Represents the source record set queried by the server. Mandatory in Mappings using relational or flat file sources
Active Transformation Connected Mode Only Ports All input/output Usage Modify SQL statement * User Defined Join Source Filter Sorted ports * Select DISTINCT * Convert data types Pre/Post SQL *relational sources only
66

Source Qualifier Properties


User can modify SQL SELECT statement (DB sources) Source Qualifier can join homogenous tables User can modify WHERE clause User can specify ORDER BY manually or automatically Pre and Post SQL can be provided SQL properties do not apply to flat file sources

67

Pre SQL and Post SQL rules


Pre SQL and Post SQL
Can use any command that is valid for the database type; no nested comments Can use mapping parameters and variables in SQL executed against the source Use a semi-colon (;) to separate multiple statements The Informatica Server ignores semi-colons within single quotes, double quotes, or within /* ...*/ If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\) The Workflow Manager does not validate the SQL

68

Mapping Validation
Mappings must be valid to be run Mapping must be end-to-end complete All expressions must be valid Mapping must obey data flow rules Mappings are always validated when saved Mappings can be validated without being saved Output Window will show why a Mapping is invalid

69

Connection Validation
Examples of invalid connections in a Mapping:

Connecting ports with mismatched datatypes Connecting output ports to a Source Connecting a Source to anything but a Source Qualifier or Normalizer transformation Connecting an output port to an output port or an input port to another input port Connecting more than one active transformation to another transformation (invalid data flow)

70

Using Variable Ports


Variable Ports can be used to simplify complex expressions For example, it could hold a depreciation formula which would be referenced more than once in the transformation Used in the Expression, Aggregator, and Rank transformations Local to the transformation (a variable port cannot be an input or output port) Used in an output port expression or another variable port The scope of a variable port is a single transformation

71

Using Variable Ports


Variable Ports can also be used for temporary storage Variable Ports can remember values across rows. This is useful for comparing values Variables are initialized (numeric to 0, string to ) when the session starts Variables Ports are not visible in Normal view, only in Edit view

Expression IIF(PREVIOUS_STATE = STATE, STATE_COUNTER + 1, 1)


72

Mapping Parameters and Variables


Apply to all transformations within one Mapping Represent declared values Variables can change in value during run-time Parameters remain constant during run-time Provide increased development flexibility Defined in Mapping menu Format is $$VariableName

73

Functions to Set Mapping Variables


SetCountVariable Counts the number of evaluated rows and increments or decrements a mapping variable for each row SetMaxVariable Evaluates the value of a mapping variable to the higher of two values SetMinVariable Evaluates the value of a mapping variable to the lower of two values SetVariable Sets the value of a mapping variable to a specified value

74

Mapping Parameters and Variables


Sample declarations
Set the appropriate aggregation type

Variables and Parameter are defined in the Designer Mapping menu


75

Mapping Parameters and Variables

76

Object Export and Import


Objects can be exported or imported in the Designer and the Workflow Manager Export/Import is in XML format Designer exports
Source Definitions Target Definitions Transformation Mapping

77

Lab Create a Mapping

78

Session Tasks

79

Session Tasks
After this section, you will be able to describe Session Task properties How to create and configure Session Tasks Transformation overrides Session partitions

80

Workflow Manager Window


Task Tool Bar

Workflow Designer Tools

Navigator

Workspace

Output Window Status Bar

81

Workflow Manager Tools


Task Developer
Enables construction of Session tasks, Shell Command Tasks and Email Tasks Tasks created in the task developer are reusable

Worklet Designer
Creates objects that represent a set of tasks Objects are reusable

Workflow Designer
Maps the execution of Sessions, Tasks and Worklets for the Informatica server

82

Session Task
Created to run a mapping (one mapping only) Session Tasks can be created in the Task Developer or Workflow Developer Steps to create a Session Task
Choose the Session button from the Tasks bar or Choose menu Tasks | Create

Session Task Bar Icon

83

Session Task
Steps to create a Session Task (continued)
Double click on the session object Valid Mappings are displayed in the dialog box

Session Task tabs


General Properties Config Object Sources Targets Components Transformation Partitions Metadata Extensions
84

Session Task - General

85

Session Task - Properties

General Options

Performance Tab

86

Session Task Config Object


Advanced Tab

Log Options

Error handling

87

Session Task - Sources

Connections Tab

88

Session Task - Targets

Database Target Connection Example

Connections Heterogeneous Targets Supported

89

Session Task - Components


Email can be None, Non-reusable or Reusable (reusable email created using an Email Task)

90

Session Task - Transformation


Pre SQL and Post SQL Feature An example of this feature can be SQL to drop indexes before the start of a session or create indexes after a session run

91

Session Task - Partitions


New feature additional partitioning types

92

Worklets

93

Represents a set of tasks

Worklets

Can Contain any task available in the workflow Manager We can run Worklets inside a Workflow Can be Nested. When to create a worklet
We create a worklet when we want to reuse a set of workflow logic in several workflows.

Worklet Designer is used to create and edit worklets A worklet does not contain any scheduling or server information. To execute a worklet we need to include it in the session. Informatica does not have a parameter file or log file for a worklet. The information about the worklet gets written to the workflow log. Types Of Worklets

Reusable Non reusable

94

Creating A Reusable Worklet

95

Creating A Non Reusable Worklet

96

Batch

97

Workflows

98

Workflows
By the end of this section you will be familiar with: Workflow Task properties Workflow properties Configuring Workflows Workflow Connections

99

Workflow Designer
Combines Session Tasks, other types of Tasks, and Worklets as a set of instructions for the Informatica Server to accomplish data transformation and load The most simple Workflow that can be created is composed of a Start Task, a link and a Session Task

Link Start Task

Session Task

100

Task Developer
Provides the basic building blocks of a Workflow Three types of reusable Tasks can be created with the Task Developer:
Session - Set of instructions to execute a mapping Command - Specifies shell commands to run during the workflow Email - Sends email after the workflow has completed
Session Command Email

101

Command Task
Allows you to specify one or more Unix shell or DOS (NT, Win2000) commands to run during the Workflow Can be referenced in a Session through the Session Components tab as a Pre or Post Session command Can be referenced as a Task component in a Workflow or Worklet

102

Command Task

103

Email Task
Can be configured to have the Informatica Server to send email at any point in the Workflow Email can be configured in a Session or as a Task in a Workflow or Worklet

104

Email Task

Email Variables

105

Workflow Designer Tasks


Six additional Tasks are available in the Workflow Designer
Decision Assignment Timer Control Event Wait Event Raise

106

Workflow Designer Tasks


Two additional components are Worklets and Links Worklets are objects that contain a series of Tasks

Links are required to connect objects in a Workflow

107

Workflow Designer
Sample Workflow
Links (required)

Start Task (required)

Session 1

Session 2

Post Session Command

108

Workflow Designer - Links


Required to connect Workflow objects Can be used to create branches in a Workflow All links are executed unless a link condition is used which makes a link false
Link 1 Link 3

Link 2 109

Workflow Designer - Links

Optional Link condition

110

Developing Workflows
Create a new workflow in the Workflow Designer

111

Developing Workflows
Enter the Workflow Properties Select a Workflow Schedule

112

Developing Workflows

Create the appropriate Workflow scheduling

113

Developing Workflows
Define events which can be used with the Raise Event Task

Define variables that can be used in later task objects (example: Decision Task)

114

Developing Workflows

Metadata Extensions provide for additional user data

115

Developing Workflows
Add Sessions and other Tasks to the Workflow Connect all Workflow components with Links Save the Workflow Start the Workflow
Save Start Workflow

Sessions can be started within a Workflow

116

Reusable Workflow Scheduler


Can be associated with multiple Workflows

Same configuration information previously discussed for a Workflow 117

Connections
Configure server data access connections

Configure: 1. Relational 2. MQ Series 3. FTP 4. Custom 5. External Loader

118

Creating a Connection
Relational (Database) connection

119

Connection Properties
Relational (Database) connection
Environment SQL SQL commands executed for each database connection

120

FTP Connection

121

External Loader Connection

122

Lab Create a Workflow

123

Monitor Workflows

124

Monitor Workflows
By the end of this section you will be familiar with: Workflow Monitor views Actions initiated from the Workflow Monitor Truncating Monitor Logs

125

The Workflow Monitor is the tool for monitoring Workflows and Tasks The Workflow Monitor shows details about a workflow or task in two views
Gantt Chart view Task view

Monitor Workflows

Gantt view Task view 126

Monitor Workflows
Workflow Monitor displays Workflows that have been run at least once Can monitor a server in two modes: online and offline
Online - the Workflow Monitor continuously receives information from the Informatica Server and the Repository Server Offline mode - the Workflow Monitor displays historic information about past workflow runs by fetching information from the repository

127

Monitoring Workflows
You can perform the following tasks in the Workflow Monitor:

Restart - restart tasks, workflows, or worklets Stop - stop a task, workflow, or worklet Abort - abort a task, workflow, or worklet Resume - resume suspended workflows after you fix the failed task

View Session And Workflow logs Stopping a Session Task means the server stops reading data Abort has a timeout of 60 sec. If the server is not finished processing and committing data by the timeout, the threads and processes associated with the session are killed

128

Monitoring Workflows
Task View
Task Name Workflow Name Worklet Name Start Time Completion Time

Ping Server

Status

Start, Stop, Abort, Resume Tasks,Workflows, and Worklets Get Session Logs (right click on Task) 129

Monitoring Workflows
Task View
Monitoring filters can be set using drop down menus

130

Monitoring Workflows
Truncating Monitor Logs
Workflow Monitor The Repository Manager Truncate Log option clears the Workflow Monitor logs

Repository Manager

131

Lab Start and Monitor a Workflow

132

Load Manager - Server Process


The Load Manager is the primary Power Center Server process. It accepts requests from the Power Center Client and from pmcmd. The Load Manager runs and monitors the workflow. It performs the following tasks:
Manages workflow scheduling. Locks and reads the workflow. Reads the parameter file. Creates the workflow log file. Runs workflow tasks and evaluates the conditional links connecting tasks. Starts the DTM, which runs the session. Writes historical run information to the repository. Sends post-session email in the event of DTM failure.

133

Pmcmd is a program that you can use to communicate with the informatica server. You can use pmcmd in the following modes Command line mode : The

Pmcmd Command

command line syntax allows you to write scripts for scheduling workflows. Each command should include connection information to the informatica server
Interactive mode : You establish and maintain an active connection to the informatica server which allows you to issue a series of commands
Command
getserverdetails Connect/disconnect help

Mode (s)
Command line, interactive Interactive Command line, interactive Command line, interactive

Description
Displays details including server status, information on active workflows, timestamp information Connect to/disconnect from the informatica server Displays a list of pmcmd commands and syntax

Waittask

Instructs the informatica server to wait for the completion of a running task before starting another command

134

Session Parameter
Represents values that we might change between sessions such as database connection or a source file. It is used in session properties and is defined in a parameter file. Can be specified when we start a session using a PMCMD command Type of Session Parameter

Built in($PM Session Log File) User Defined(Database connections, Source file names,Target File Names)

Use Of Parameters makes the session more flexible. Make Session management easier Do not have default values.

135

Session Parameter

136

LOG FILE
Informatica creates log files for each workflow that it runs. The log files contain information about tasks the informatica server performs and the statistics about the workflow and all the sessions in the workflow. If the writer or the target database rejects data during a session run, informatica server creates a file that contains the rejected rows. Types of log files

Workflow log Session log Reject file log

137

Parameter file
Parameterization gives a standard feel to the sessions/workflows. Session parameters represent values that we have to change between sessions, such as a database connection or source file. Mapping parameters are given in the session properties, and then defined the in a parameter file. During the session run the Power center Server evaluates all references of the parameter to that value.

138

Specifying Parameters In A Session

139

Filter transformation

140

Filter transformation
By the end of this section you will be familiar with: Filter properties

141

Filter transformation
Drops rows conditionally
Active Transformation Ports All input / output Specify a Filter condition Usage Filter rows from flat file sources Single pass source(s) into multiple targets
142

Aggregator Transformation

143

Aggregator Transformation
By the end of this section you will be familiar with:
Aggregator properties Aggregator expressions Using sorted data

144

Aggregator Transformation
Performs aggregate calculations

Active Transformation Connected Mode Only Ports Mixed Variables allowed Group By allowed Create expression in an output port Usage Standard aggregations
145

Aggregator properties
Use to toggle Sorted input

Set Aggregator cache sizes on Informatica Server machine

146

Informatica Functions
Aggregate Functions
AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE

Return summary values for non-null values in selected ports Aggregate functions can be used only in Aggregator transformations Calculate a single value for all records in a group Only one aggregate function can be nested within an aggregate function Conditional statements can be used with these functions
147

Aggregator expressions
Aggregate functions are supported only in the Aggregator Transformation

Conditional Aggregate expressions are supported Format example: SUM(value, condition)


148

Sorted Data
The Aggregator can handle sorted or unsorted data. Sorted data can be aggregated more efficiently, decreasing total processing time The PowerCenter server will cache data from each group, and release the cached data upon reaching the first record of the next group Data must be sorted according to the group by Aggregator ports Performance gain will depend upon varying factors

149

Aggregating Unsorted Data


Group by: store, department, date Unsorted data

No rows are released from Aggregator until all rows are aggregated

150

Aggregating Sorted Data


Group by: store, department, date Data sorted by store, department, date

Each separate group (one row) is released as soon as that group is aggregated

151

Joiner transformation

152

Joiner transformation
By the end of this section you will be familiar with: When to use a Joiner Transformation Joiner properties Joiner expressions Nested joins

153

Homogeneous Joins
Joins that are done with a SQL SELECT statement: Source Qualifier contains join SQL Tables on same DB server (or are synonyms) Database server does the join work Multiple homogenous tables can be joined

154

Heterogeneous Joins
Joins that cannot be done with a SQL statement: An Oracle table and a Sybase table Two Informix tables on different servers Two flat files A flat file and a database table

155

Joiner transformation
Performs heterogeneous joins on records from different databases or flat file sources
Active Transformation Connected Ports All input or input / output M denotes port comes from master source Specify the Join condition Usage Join two flat files Join two tables from different databases Join a flat file with a relational table
156

Joiner properties

Join type can be


normal (inner) Master outer Detail outer Full outer

157

Joiner Conditions

Multiple conditions are supported

158

Nested joins
Used to join three or more heterogeneous sources

159

Mid-mapping join
The Joiner does not accept input in the following situations:
Both input pipelines begin with the same Source Qualifier Both input pipelines begin with the same Normalizer Both input pipelines begin with the same Joiner Either input pipeline contains an Update Strategy Either input pipeline contains a connected or unconnected Sequence Generator transformation

160

Lab Heterogeneous Join

161

Sorter Transformation

162

Sorter Transformation
By the end of this section you will be familiar with: Sorter properties Sorter limitations

163

Sorter Transformation
Can sort data from relational tables or flat files The sort takes place on the Informatica server machine. No database server is involved in performing the sort Multiple sort keys are supported The Sorter transformation is often more efficient than a sort performed on a database with an ORDER BY clause

164

Sorter Transformation
One or more sort keys defined

Ascending or Descending sort order

165

Sorter Properties
Cache size can be changed
Minimum is 8 Mb. Ensure set cache size is actually available on the Informatica server, or Session Task will fail

166

Lookup Transformation

167

Lookup Transformation
By the end of this section you will be familiar with: Lookup principles Lookup properties Lookup techniques

168

How a Lookup works


For each Mapping row, one or more port values are looked up in a database table. If a match is found, one or more table values are returned to the Mapping
Lookup value Lookup transformation

Return value

169

Lookup Transformation
Looks up values from database objects and provides to other components in a mapping
Passive Transformation Connected/Unconnected Ports Mixed L denotes Lookup port R denotes port used as a return value in an unconnected lookup Specify the Lookup condition Usage Get related values Verify if records exists or if data has changed

170

Lookup properties
Can override Lookup SQL

Toggle caching

Database Connection Object name

171

Additional Lookup properties


Set cache directory

Make cache persistent

Set cache sizes

172

Lookup Conditions
Multiple conditions are supported

173

To cache or not to cache?


Can significantly impact performance Cached
Lookup table data is cached locally Mapping rows are looked up against the cache Only one SQL SELECT is needed

Uncached
Each Mapping row needs one SQL SELECT

Rule Of Thumb: Cache if the number of records in the Lookup table is small relative to the number of mappings rows
174

Lab Basic Lookup

175

Dynamic Lookup

176

Dynamic Lookup
By the end of this section you will be familiar with: Dynamic lookup theory Dynamic lookup advantages Dynamic lookup limitations

177

Four Lookup Cache Options


Make cache persistent Dynamic Lookup Cache
Allows cache to know about rows that follow a row

Cache File Name Prefix


Reuse cache-by-name for another similar business purpose

Recache from Database


Overrides other settings and lookup data is refreshed
178

Persistent caches
By default, Lookup caches are not persistent When Session completes, cache is erased Cache can be made persistent with the Lookup properties When Session completes, the persistent cache is stored on server hard disk files The next time Session runs, cached data is loaded fully or partially into RAM and reused Can improve performance, but stale data may pose a problem
179

Dynamic Lookup Cache Advantages


When the target table is also the lookup table, cache should be changed as the target is changed New rows affect the dynamic lookup cache as they are inserted into the target or used to update the target Dynamic lookup cache and target remain synchronized throughout the session run

180

Dynamic Lookup Cache Limitations


Does not work for deletions from the target Lookup transformation must be used in connected mode Only the equal ( = ) operator is allowed in the lookup conditions

181

Update Dynamic Lookup Cache


NewLookupRow port
0 static lookup, cache not changed 1 insert row to lookup cache 2 update row in lookup cache

Does NOT change row type Use update strategy transformation before or after lookup to flag rows for insert or update to the target Ignore NULL Property
Per port Ignore NULL values from input row and update the cache using only with non-NULL values from input

182

Update Dynamic Lookup Cache


Ability to update rows in the lookup memory cache. Update Flags:
Insert else update. Apply to insert row type only
ON: If row FOUND in cache, perform update OFF: If row FOUND in cache, perform lookup

Update else Insert. Apply to update row type only


ON: If row NOT FOUND in cache, perform insert OFF: If row NOT FOUND in cache, perform lookup

183

Ports Tab in Dynamic Lookup Mode


NewLookupRow Associated Port

184

Example Dynamic Lookup Configuration


Router Group Filter Condition should be: NewLookupRow = 1

185

Update Strategy transformation

186

Update Strategy Transformation


By the end of this section you will be familiar with: Update Strategy expressions Refresh strategies Incremental aggregation Smart aggregation

187

Update Strategy Transformation


Used to specify how each individual row will be used to update target tables (insert, update, delete, reject)
Active Transformation Connected Ports All input / output Specify the Update Strategy Expression Usage Updating Slowly Changing Dimensions IIF or DECODE logic determines how to handle the record
188

Update Strategy Expressions


Constants for the Database Operations

INSERT

DD_INSERT

UPDATE DD_UPDATE1 DELETE DD_DELETE REJECT DD_REJECT 2 3

189

Update Strategy Expressions


IIF ( score > 69, DD_INSERT, DD_DELETE ) Expression is evaluated for each row Rows are tagged according to the logic of the expression Appropriate SQL is submitted to the target database: insert, delete or update DD_REJECT means the row will not have SQL written for it. Target will not see that row Rejected rows may be forwarded through Mapping
190

Target Refresh Strategies


Single snapshot: Target truncated, new records inserted Sequential snapshot: new records inserted Incremental: Only new records are inserted. Records already present in the target are ignored Incremental with Update: Only new records are inserted. Records already present in the target are updated

191

Router transformation

192

Router transformation
By the end of this section you will be familiar with: Using a Router Router groups

193

Router transformation
Rows sent to multiple filter conditions

Active Transformation Connected Ports All input/output Specify filter condition for each group Usage Link source data in one pass to multiple filter conditions
194

Router transformation in a Mapping

195

Router Groups
Input group (always one) User-defined groups Default group (always one) Each group has one condition ALL group conditions are evaluated for each row Group outputs can be ignored

196

Reusable transformations

197

Reusable transformations
By the end of this section you will be familiar with: Reusable transformation advantages Reusable transformation limitations Promoting transformations to reusable Demoting reusable transformations

198

Transformation Developer

Reusable transformations

199

Reusable transformations
Define once - reuse many times Reusable Transformations
Can be a copy or a shortcut Edit Ports only in Transformation Developer Can edit Properties in the mapping Instances dynamically inherit changes Be careful: It is possible to invalidate mappings by changing reusable transformations

Transformations that cannot be made reusable


Source Qualifier ERP Source Qualifier Normalizer used to read a Cobol data source
200

Promoting a Transformation to Reusable

Check box in Edit Mode (Transformation Tab)

201

Demoting Reusable Transformations


1. Drag a Reusable transformation from the Navigator window onto a mapping (Mapping Designer tool) 2. Hold down the Ctrl key while you are dragging 3. A message will appear in the status bar (shown below) 4. Drop the transformation into the mapping 5. Save the changes to the repository

Must be done within the same folder


202

Sequence Generator transformation

203

Sequence Generator transformation


By the end of this section you will be familiar with: Using a Sequence Generator

204

Sequence Generator transformation


Generates unique keys for records

Passive Transformation Connected Ports Two predefined output ports, NEXTVAL and CURRVAL No input ports allowed Usage Generate sequence numbers Shareable across mappings

205

Target Options

206

Target options
By the end of this section you will be familiar with: Row operations Load types Constraint-based loading Error handling

207

Target properties
Session Task Chose target

Row operations Error handling

208

WHERE Clause for Update & Delete


PowerCenter uses the primary keys defined in the Warehouse Designer to determine the appropriate SQL WHERE clause for updates and deletes. Update SQL
UPDATE <target> SET <col> = <value> WHERE <primary key> = <pkvalue> The only columns updated are those which have values mapped to them All other columns in the target are unchanged The WHERE clause can be overridden via Update Override

Delete SQL
DELETE from <target> WHERE <primary key> = <pkvalue>

The SQL statement used will appear in the Session log file

209

Constraint-based Loading
Maintains referential integrity in the Targets
pk1 fk1, pk2

Example 1
With only One Active source, rows for Targets 1-3 will be loaded properly and maintain referential integrity

fk2

pk1 fk1, pk2 fk2

Example 2
With Two Active sources, it is not possible to control whether rows for Target 3 will be loaded before or after those for Target 2

The following transformations are Active sources: Advanced External Procedure, Source Qualifier, Normalizer, Aggregator, Sorter, Joiner, Rank, Mapplet (containing any of the previous transformations)

210

Error handling (row level)


Reject files on server store data rejected by the writer and/or target database Conditions causing data to be rejected include
Target database constraint violations, out-of-space errors, log space

errors, null values not accepted, etc.


Data-driven result code of 3 or DD_REJECT Target table properties reject truncated/overflowed rows
INSERT UPDATE DELETE REJECT 0,D,1313,D,Regulator System,D,Air Regulators,D,250.00,D,150.00,D 1,D,1314,D,Second Stage Regulator,D,Air Regulators,D,365.00,D,265.00,D 2,D,1390,D,First Stage Regulator,D,Air Regulators,D,170.00,D,70.00,D 3,D,2341,D,Depth/Pressure Gauge,D,Small Instruments,D,105.00,D,5.00,D

Data, Overflow, Null or Truncated Column Level Indicators

Transformation errors are written to the session log, not the .bad file
211

Concurrent and Sequential Workflows

212

Concurrent and Sequential Workflows


By the end of this section you will be familiar with: Concurrent Workflows Sequential Workflows Scheduling Workflows Stopping, aborting, and suspending Tasks and Workflows

213

Multi-Task Workflows
Tasks can be run sequentially, like this:

Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer, Email, etc.

214

Multi-Task Workflows
Tasks can be run concurrently, like this:

Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer, Email, etc.
215

Multi-Task Workflows
Tasks can be run in a combination concurrent and sequential pattern within one Workflow, like this:

Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer, Email, etc.
216

Additional transformations

217

Additional transformations
By the end of this section you will be familiar with: The Rank transformation The Normalizer transformation The Stored Procedure transformation The External Procedure transformation The Union transformation

218

Rank transformation
Filters the top or bottom range of records

Active Transformation Connected Ports Mixed One predefined output port RANKINDEX Variables allowed Group By allowed Usage Select top/bottom Number of records

219

Normalizer transformation
Normalizes records from relational or VSAM sources

Active Transformation Connected Ports Input / output or output Usage Required for VSAM Source definitions Normalize flat file or relational source definitions Generate multiple records from one record

220

Turn one row

Normalizer transformation

YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, MONTH12 1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,4 0000,45000 1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000 1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250

Into multiple rows

221

Normalizer transformation

Generated Column ID
222

Stored Procedure transformation


Calls a database stored procedure

Passive Transformation Connected/Unconnected Ports Mixed R denotes port will return a value from the stored function to the next transformation Usage Perform transformation logic outside PowerMart / PowerCenter
223

Union Transformation
Multiple input group transformation that can be used to merge data from multiple pipelines or pipeline branches into one pipeline branch. Similar to the UNION ALL SQL statement Union transformation does not remove duplicate rows

224

Mapping with a Union Transformation

225

External Procedure transformation (TX)


Calls a passive procedure defined in a dynamic linked library (DLL) or shared library
Passive Transformation Connected/Unconnected Ports Mixed R designates return value port of an unconnected transformation Usage Perform transformation logic outside PowerMart / PowerCenter Option to allow partitioning
226

Java Transformation
Transformation type: Active/Passive Connected Java transformation behavior is based on the following events: The transformation receives an input row The transformation has processed all input rows The transformation receives a transaction notification such as commit or rollback

227

Steps to Define a Java Transformation


Create the transformation in the Transformation Developer or Mapping Designer. Configure input and output ports and groups for the transformation. Use port names as variables in Java code snippets. Configure the transformation properties. Use the code entry tabs in the transformation to write and compile the Java code for the transformation. Locate and fix compilation errors in the Java code for the transformation.
228

Transaction Control Transformation


Transformation type: Active Connected PowerCenter lets you control commit and roll back transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or roll back rows. You can define a transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date.

229

Transaction Control Transformation


TC_CONTINUE_TRANSACTION. The Integration Service does not perform any transaction change for this row. This is the default value of the expression. TC_COMMIT_BEFORE. The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_COMMIT_AFTER. The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. TC_ROLLBACK_BEFORE. The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_ROLLBACK_AFTER. The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction.
230

HTTP Transformation
Transformation type: Passive Connected The HTTP transformation enables you to connect to an HTTP server to use its services and applications. When you run a session with an HTTP transformation, the Integration Service connects to the HTTP server and issues a request to retrieve data from or update data on the HTTP server, depending on how you configure the transformation.

231

SQL Transformation
Transformation type: Active/Passive Connected The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors

232

SQL Transformation Mode


Script mode Query mode

233

Conditional lookups
By the end of this section you will know conditional lookup: Technique Advantages Limitations

234

Conditional Lookup Technique


Two requirements: Unconnected or function mode lookup Lookup function used within a conditional statement The conditional statement is evaluated for each row. The Lookup function is called only under a predefined condition
condition lookup function

IIF ( ISNULL(item_id),:lkp.mylookup (sku_numb))


235

Unconnected lookup
Always literally unconnected from other transformations. There are no blue data flow arrows leading to or from an unconnected lookup

Lookup function can be used within any transformation that supports expressions, such as this Aggregator

Unconnected lookup

236

Conditional Lookups

237

Conditional lookup Advantage


The lookup is performed only for those rows which require it. This may be a small fraction of the total rows.
Example: A Mapping will process 500,000 rows. In approximately two percent of those rows (10,000) the item_id value is null. Using the expression below, the Lookup function will be called 10,000 times. Without a conditional lookup, the lookup would be performed 500,000 times. Net savings = 490,000 lookups.
true for two percent of all rows called only when conditional is true

IIF ( ISNULL(item_id), :lkp.mylookup (sku_numb))


238

Conditional lookup Limitation


Main limitation: one lookup port value may be returned for each lookup. The return port is defined in the Ports tab

Return port

WARNING ! If the return port is not defined, the lookup function expression will be invalid

239

Connected and Unconnected Lookups


Also called function mode Syntax: :lkp.lookupname(portname)
CONNECTED LOOKUP UNCONNECTED LOOKUP

Part of the mapping data flow Returns multiple values (by linking output ports to another transformation) Executed for every record passing through the transformation More visible, shows where the lookup values are used Default values are used

Separate from the mapping data flow Returns one value (by checking the Return (R) port option for the output port that provides the return value) Only executed when the lookup function is called Less visible, as the lookup is called from an expression within another transformation Default values are ignored
240

Heterogeneous Targets

241

Heterogeneous Targets
By the end of this section you will be familiar with: Heterogeneous target types Heterogeneous target limitations Target conversions

242

Definition: Heterogeneous Targets


Supported target definition types: Relational database Flat file XML ERP (SAP BW, PeopleSoft, etc.) A heterogeneous target is where the target types are different or the target database connections are different within a single Session Task
243

Method one: different target types


Oracle table

Oracle table

Flat file

244

Method two: different database connections


Database connections are different. May be different database types or different connection strings, etc.

245

Target Type Override (Conversion)


Example: Mapping has SQL Server target definitions. Session Task can be set to load Oracle tables instead, using an Oracle database connection. Limitations: Only the following overrides are supported: Relational target to flat file target Relational target to any other relational database type SAP BW target to a flat file target

246

Mapplets

247

Mapplets
By the end of this section you will be familiar with: Mapplet advantages Mapplet types Mapplet limitations

248

Mapplet Designer

Mapplet Designer Tool

Mapplet Transformation Icons

Mapplet Output Transformation


249

Mapplet Advantages
Useful for repetitive tasks / logic Represents a set of transformations Mapplets are reusable Use an instance of a Mapplet in a Mapping Changes to a Mapplet are inherited by all instances Server expands the Mapplet at runtime

250

A Mapplet within a Mapping

251

The detail inside the Mapplet

252

Unsupported Transformations
You may use any transformation in a Mapplet except: XML Source definitions COBOL Source definitions Normalizer Pre and post-Session stored procedures Target definitions Other Mapplets

253

Mapplet Input Types


Internal: One or more Source definitions / Source Qualifiers within the Mapplet External: Mapplet contains a Mapplet Input transformation, receives data from the Mapping it is embedded within One or the other is required in every Mapplet. However, Informatica does not allow both input types within a single Mapplet.

254

Data Source Inside a Mapplet


Mapplet contains sources defined WITHIN the mapplet logic

The resulting Mapplet has no input ports When this Mapplet is used in a Mapping, it must be the first object in the data flow

255

Data Source Outside a Mapplet

A Mapplet Input transformation has NO input ports of its own Only ports connected from the Input transformation to another transformation display in the resulting Mapplet When connecting ports from the Input transformation, you may not connect the same port to more than one transformation

Transformation

Transformation

256

Mapplet Output
Use a Mapplet Output transformation Define Mapplet Output ports Mapplets must contain at least one Output transformation An Output transformation must have at least one port connected to another transformation within the Mapplet

257

Mapplet with Multiple Output Groups

Output to multiple instances of the same target table


258

Unmapped Mapplet Output Groups

Mapplet Output Group NOT mapped

259

Active and Passive Mapplets


Active Mapplets contain one or more active transformations Passive Mapplets contain only passive transformations Be careful: changing a passive Mapplet into an active Mapplet, may invalidate Mappings which use that Mapplet

260

Using Active and Passive Mapplets


Multiple Passive Mapplets can populate the same target instance

Passive

Active

Multiple Active Mapplets or Active and Passive Mapplets cannot populate the same target instance

261

Repository Topics
By the end of this section you should be familiar with: The purpose of the Repository Server and Agent The Repository Manager interface Repository maintenance operations Security and privileges Object sharing, searching, and locking

262

Repository Service
Each repository has an independent architecture for the management of the physical repository tables Components: one Repository Service
Informatica Adminconsole

Domain

Repository Agent(s)

Client overhead for repository management is greatly reduced by the Repository Service

Repository
263

Repository Admin Console


List of function available for administrator
New repository creation Stop Backup Upgrade

264

Repository Management
Perform all repository maintenance tasks using the Informatica Admin Console
Maintenance tasks: Create Copy Backup Restore Upgrade Register Un-Register Delete Notify Users Last activity log
265

Repository Manager interface


Navigator Window

Analysis Window

Dependency Window

Output Window

266

Steps:

Groups and Privileges Adminconsole


Create groups Assign privileges to groups Create users Assign users to groups Assign additional privileges to users (optional)

267

User Management
GROUP STRUCTURE
Groups Users Privileges

Administrators Public

Administrator Database user As defined

(all privileges) Use Designer Browse Repository Use Workflow Manager As defined

As defined

As defined

SECURITY CONTROL
Security Access To Issued By Issued To

Privileges Permissions

Repository Folder

Super User Administrator Super User Folder Owner Administrator

Groups Users Folder Owner Owners Group Repository


268

Repository Permissions in Adminconsole


Assign one user as the folder owner for first tier permissions Select one of the owners groups for second tier permissions All users and groups in the repository will be assigned the third tier permissions
269

User Connections & Locks


View details for users connected to the repository Allows to terminate user connections

270

Thank You

271

You might also like