Informatica PowerCenter 7 Advance Training

Course Objectives
 PowerCenter/PowerMart Architecture  PowerCenter Advanced Transformations  Informatica Performance Tuning  Informatica Templates


PowerCenter Architecture


This section will cover  Informatica Architecture & Connectivity - 6.x and 7.x  Informatica Architecture & Connectivity – 5.x

 Architectural Differences


Informatica Architecture & Connectivity – 6.x and 7.x

native ODBC


Heterogeneous Sources

Repository Server

Heterogeneous Targets


Repository Agent

Repository Designer Workflow Workflow Rep Server Manager Manager Monitor Administration Console



Informatica Architecture – 5.x
Designer Server Manager Repository Manager

Sources 1-n

Targets 1-n



Informatica Connectivity – 5.x
Network Protocol Native Drivers ODBC
Source(s) Target(s)

Native / ODBC

Client Repository Manager

Native / ODBC

Designer Repository Server Manager ODBC Native


Network Protocol


Architectural Differences
Sr. No. 1 Informatica 5.x Only one server component - Informatica server which is ETL engine of Informatica suite Informatica 6.x and 7.x versions Two server components - Repository Server (for managing communication of different Informatica client and server tools with Metadata repository) and Informatica Server (ETL engine of Informatica suite) All client tools and Informatica server communicate with Repository server over TCP/IP. Repository server connects to Metadata repository using Repository Agent processes which use native connectivity for accessing metadata Workflows are executable components Workflow Manager tool used for registering Informatica Server with repository, creation of tasks, worklets and workflows and scheduling of workflows. Workflow Monitor used for load-monitoring Versioning is in-built in the suite. In addition, there are built-in configuration management tools in order to migrate components between environments such as Dev, Test and Prod supporting a lights out migration Repository Server Administrator Console is a new client tool for maintaining and managing repository PMREPAGENT Command-line utility now available for Repository maintenance 8 PMREP used for User Management


Native or ODBC connectivity used for communication between Metadata repository and client tools and Informatica server Sessions are executable components Server Manager client tool used for registering Informatica Server with repository, creation of sessions and batches and scheduling the same as well as load-monitoring Third-party tools required for versions of Informatica components managing

3 4



Repository Manager used for all maintenance tasks like Backup and Restore, copy, creating repository contents, etc PMREP Command-line utility used for all User management as well as Repository Maintenance


PowerCenter Advanced Transformations


This section will cover  Transformation Concepts – Recap  Stored Procedure Transformation (Lab 1)

 Filter  Aggregator
 Pivoting Data (Rows to columns) (Lab 2)

 Incremental Aggregation (Lab 3,4)  Implementation using mapping variables and Aggregator
 Implementation using SP and Aggregator  Implementation using Filter and Aggregator

 Joiner
 Self Join (Lab 5)  Joiner to emulate Lookup Contd…

 Normalizer
 Converting Columns to Rows (Lab 6)  Sourcing data from VSAM files

 Understanding Transaction Control  Transaction Control (Lab 7)
 Transaction Control to restart sessions

 Lookup
 Dynamic Lookup  Unconnected Lookup

 Union Transformation (Lab 8)


What is a transformation?
 A transformation is repository object that generates, modifies and/or passes data to other repository object


Transformation Types
 Active Transformation  The number of rows that pass through this transformation change, i.e. Number of rows in the transformation may or may not be equal to number of rows out of the transformation  Example – Number of rows coming in Filter transformation may be 10. But depending on condition, number of rows out of Filter can be anywhere between Zero and 10

 Passive Transformation
 The number of rows into the transformation is always equal to number of rows out of the transformation  Example – Number of rows into Expression is always equal to number of rows out of the transformation


Stored Procedure Transformation
 A stored procedure is a precompiled collection of Transact-SQL, PL/SQL or other database procedural and flow control statements. To call this from within Informatica, Stored Procedure Transformation is used  This passive transformation can be connected or unconnected  The stored procedure transformation can be scheduled to run  Normal – Runs like any other inline transformation  Pre-load of the Source – Runs before session retrieves data from the Source  Post-load of the Source – Runs after session retrieves data from the Source  Pre-load of the Target – Runs before session sends data to the Target Contd…

Stored Procedure Transformation
 Post-load of the Target – Runs after session sends data to the target

 Informatica supports following databases for Stored Procedure Transformation:
 Oracle  Informix  Sybase SQL Server  Microsoft SQL Server  IBM DB2

 Teradata

Stored Procedure Transformation
 Stored Procedure Transformation Ports
 Input/Output Parameters – send and receive input and output parameters using ports, variables or values hard-coded in expression  Return Values  Status Codes – The stored procedure issues a status code that notifies whether or not it completed successfully. PowerCenter server uses these to determine weather to continue running the session or stop it

 Stored Procedure Transformation Properties
 Stored Procedure Name – Name of the procedure/function as in the database  Connection Information – Database containing the stored procedure. Default value is $Target  Call Text – This is used only when transformation type is not Normal. Input parameters passed to SP can be included within the call text

Stored Procedure Transformation


Filter - Basics
 Filter transformation allows to filter rows in a mapping  It is active transformation and is always connected

 All ports in Filter transformation are input/output ports
 Rows from multiple transformations cannot be concatenated into Filter Transformation. Thus input ports for Filter must come from a single transformation

 Any expression that returns Boolean value can be used as a filter condition
 Filter condition is case sensitive


Filter – Performance Tips
 Place the filter as close as possible to the Source to maximize session performance


Aggregator - Basics
 Aggregator transformation performs aggregate calculations like sum, max, min, count on groups of data

 Aggregate transformation uses Informatica Cache to perform these aggregations
 Typically Aggregator has  Aggregate expressions in output port

 Group By port to create groups. The port can be I, I/O, V or O port
 Aggregator is Active transformation and is always connected


Aggregator Cache


Aggregator – Performance Tips
 Connect only necessary input/output ports to subsequent transformations, reducing the size of data cache

 Pass data to Aggregator sorted by Group By ports, in ascending or descending order
 Use Incremental Aggregation if changes in Source are less than half the target


Aggregator – Pivoting Data (Rows to Columns)
 In many scenario‟s, source data rows need to be converted to columns

 Example: Source contains two fields, field_name and field_value. These are to be converted into a target table which has column names as given by list of field_names in Source
 Aggregator transformation and FIRST function can be used for converting rows into columns  Elaborating above example, target field names are First_Name, Last_Name, Middle_Name and Salutation o_First_Name = FIRST(field_value, field_name = „FIRSTNAME‟)


Aggregator – Pivoting Data (Rows to Columns)


Incremental Aggregation
 When to use Incremental Aggregation?  New source data can be captured  Incremental changes do not significantly change the target  When aggregation does not contain Percentile or median 2 functions  When mapping includes aggregator with Transaction Transformation scope  How to capture Incremental Data?  Using Filter and Variables

 Using Stored Procedure


Incremental Aggregation
 Session failure scenarios when using Incremental Aggregation  When PowerCenter Server data movement mode is changed from ASCII to Unicode or vise-versa  When PowerCenter Server code page is changed to incompatible code page  When session sort order is changed for PowerCenter server in Unicode mode  When „Enable High Precision‟ option is changed in a session To avoid session failures for above, reinitialize the cache by deleting the incremental aggregation files


Incremental Aggregation
 Configuring Session  Set location of cache files. By default index and data files are created in $PMCacheDir  Cache filenames can be written to session log by setting tracing level to Verbose Init. In session log, path after TE_7034 gives the location of index file and path after TE_7035 gives the location of data file  Incremental Aggregation Session Properties can be set in Session -> Properties -> Performance


Incremental Aggregation – Using Mapping Variable

Mapping Variable


Incremental Aggregation – Using Filter


Joiner - Basics
 Joiner Transformation joins two sources with at least one matching port  Joiner requires inputs from two separate pipelines or two branches from one pipeline  Joiner is an Active transformation and is always connected  Joiner transformation cannot be used in following cases:  Either input pipeline contains an Update Strategy  A Sequence Generator transformation is directly connected before a Joiner transformation  Joiner conditions for string comparison are case sensitive  Join types supported are Normal, Master Outer, Detail Outer and Full Outer


Joiner - Cache


Joiner – Debugging Common Errors
"Assertion/Unexpected condition at file tengine.cpp line 4221" running session with multiple Joiners Product : PowerMart Problem Description: When running a session with multiple Joiner Transformations, the following error occurs: On NT: MAPPING> CMN_1141 ERROR: Unexpected condition at file:[d:\cheetah\builder\powrmart\server\dmapper\trans\tengine.cpp] line:[4221]. Application terminating. Contact Informatica Technical Support for assistance. On Unix: Session fails with no error and the following assertion appears on the standard output (stdout) of the pmserver machine. Assertion failed: ((m_teFlags) & (0x00000002)), file /export/home1/build50/pm50n/server/dmapper/trans/tengine.cpp, line 4221

Joiner – Debugging Common Errors
Solution: This is because of an invalid master/detail relationship for one or more of the Joiner Transformations. Informatica reads all the master rows before it starts reading the detail. So if there are multiple joiners in the mapping, make sure the master/detail relationships are defined in such a way that the server will be able to define the DSQ order.


Self Join
 Scenario of self join is encountered when some aggregated information is required along with Transaction data in target table

 Using Joiner transformation, this can be achieved in Informatica PowerCenter version 7.x
 This feature was not available in earlier Informatica versions


Self Join

Aggregator and sorter outputs combined in Joiner transformation


Joiner to emulate Lookup
A joiner can be used to emulate the lookup, but with certain limitations: - Only equi- conditions are supported. - Duplicate matching is not supported. - Joiner will always cache, Non-Cached is not an option. Here are the settings that need to be made on the Joiner for it to emulated the lookup: 1. The table that is used for the lookup must be brought into the mapping as a source. 2. The "lookup" table should be joined to the other table with an outer join. The outer join must be based on the "lookup" table, i.e. if it is the Detail table then it should be a Detail Outer Join, if it is the Master table then it should be a Master Outer Join.


Joiner to emulate Lookup
3. The Condition must set in the same way that it would have been in the Lookup. Note: Only equi- conditions are allowed. 4. The lookup values must be mapped downstream from the Joiner just as it would be with a Lookup Transformation.


Joiner to emulate Lookup


Normalizer - Basics
 Normalizer Transformation is used to organize the data in a mapping  COBOL Sources –  For COBOL sources, Normalizer acts as a Source Qualifier  These sources are often stored in normalized form, where OCCUR statement nests multiple records in single record  For relational sources, Normalizer can be used to create multiple rows from a single row of data  Normalizer is an Active transformation and is always connected


Normalizer – Ports
 When using normalizer, two keys are generated  Generated Key  One port for each REDEFINES clause  Naming convention - GK_<redefined_field_name>  At the end of each session the Power Center updates the GK value to the last value generated for the session plus one  Generated Column ID  One port for each OCCURS clause  Naming convention - GCID_<occurring_field_name>
Note: These are default ports created by Normalizer and cannot be deleted

Normalizer – Debugging Common Errors
A mapping that contains a Normalizer transformation fails to initialize with the following error: "TE_11017: Cannot match OFOSid with IFOTid"? Sol: This error will occur when the inputs to the Normalizer and outputs from it do not match. Resolutions: 1. Make sure there are no unconnected input ports to the Normalizer transformation. 2. If the Normalizer has an Occurs in it, make sure number of input ports matches the number of Occurs. Note: Second case is more common in mappings where Normalizer is not a Source Qualifier where definition is somehow changed by hand or damaged. To fix this, drop and then recreate the Normalizer transformation.

Normalizer – Debugging Common Errors
A session with a COBOL source crashes with the following error: WRITER_1_*_1> WRT_8167 Start loading table *********** FATAL ERROR : Caught a fatal signal/exception *********** *********** FATAL ERROR : Aborting the DTM process due to fatal signal/exception. *********** *********** FATAL ERROR : Signal Received: SIGSEGV (11) Sol: This error will occur when the normalizer output ports are connected to two downstream transformations. Contd…

Normalizer – Debugging Common Errors

1. Add an Expression Transformation right after the Normalizer
2. Connect all the ports from the Normalizer to the Expression Transformation 3. Connect this Expression Transformation to the existing downstream transformations as needed


Normalizer – Converting Columns to Rows

Set Occurrence Change Level to 2


Normalizer – VSAM Sources

Built-in seq


Understanding Transaction Control
 A transaction is a set of rows bound by COMMIT or ROLLBACK, i.e. the transaction boundaries

 Some rows may not be bound by Transaction boundaries and is an open transaction
 Transaction boundaries originate from transaction control points  Transaction Control Point is a transformation that defines or 1 redefines the transaction boundary


Transformation Scope
 PowerCenter server can be configured on how to apply transformation logic to incoming data with the Transformation Scope transformation property. Following values can be set:  Row – Applies the transformation logic to one row of data at a time. Examples are expression, external procedure, filter, lookup, normalizer (relational), router, sequence generator, stored procedure, etc.

 Transaction – Applies the transformation logic to all rows in a transaction. With this scope, PC server preserves incoming transaction boundaries. Examples are aggregator, joiner, Custom transformation, Rank, Sorter, etc.
 All Input – Applies the transformation logic on all incoming data. With this scope, PC server drops incoming transaction boundaries and outputs all rows from the transformation as an open transaction. For transformations like aggregator, joiner, Rank, Sorter and Custom transformation, this is default scope

Transaction Control Transformation – Basic
 Transaction Control transformation is used to define conditions to commit or rollback transactions from relational, XML and dynamic IBM MQ series targets.  Following built-in variables can be used to create transaction control expression  TC_CONTINUE_TRANSACTION – This is the default value of the expression  TC_COMMIT_BEFORE  TC_COMMIT_AFTER  TC_ROLLBACK_BEFORE  TC_ROLLBACK_AFTER  Transaction Control Transformation is an Active transformation and is always connected

Transaction Control Transformation – Guidelines
 If a mapping includes an XML target, and you choose to append or create a new document on commit, the input groups must receive data from the same transaction control point  Transaction Control transformation connected to any target other than relational, XML or dynamic IBM MQSeries targets is ineffective for those targets  Each target instance must be connected to Transaction Control transformation in a mapping where multiple targets are present  Multiple targets can be connected to single Transaction Control transformation  Only one effective Transaction Control transformation can be connected to a target  Transaction Control transformation cannot be connected in a pipeline branch that starts with a Sequence Generator transformation Contd…


Transaction Control Transformation – Guidelines
 If Dynamic Lookup transformation and Transaction Control transformation are used in same mapping, a rolled-back transaction might result in unsynchronized target data  A Transaction Control transformation may be effective for one target and ineffective for another target. If each target is connected to an effective Transaction Control transformation, mapping is valid  Examples of Effective/Ineffective Transaction Control transformation – Please refer Advanced training folder on Informatica Repository  Transaction Control transformation can be used to  Restart Failed sessions

 Constraint-based delete of rows with foreign key relations


Transaction Control Transformation


Lookup - Basics
 Lookup transformation is created when data is to be looked up  Lookup transformation is passive and can be connected or unconnected  Lookup is used to perform tasks like  Getting a related value (keys from dimension table)  Perform a calculation

 Update slowly changing dimension tables
 Lookup can be configured as  Connected or unconnected

 Relational or flat file lookup
 Cached or un-cached  For un-cached lookups, PowerCenter server queries the Lookup table for each input row

Lookup - Cache
 The PowerCenter Server builds a cache in memory when it processes the first row of data in a cached Lookup transformation

 The PowerCenter Server stores Condition Values in the Index Cache and Output Values in Data Cache
 Cache files are created in $PMCacheDir by default. These files are deleted and cache memory is released on completion of session, exception being Persistent Caches  Lookup cache can be  Persistent Cache  Static Cache  Dynamic Cache  Shared Cache

Lookup - Cache
 Persistent Cache – With this option, lookup cache files are created by Informatica Server the first time and can be reused the next time Informatica Server processes Lookup transformation, thus eliminating the time required to read the Lookup table. Lookup can be configured to use persistent cache when table does not change between sessions.  Static Cache – By default, PowerCenter Server creates static or read-only cache for any lookup table. The PowerCenter Server does not update the cache while it processes the Lookup Transformation  Dynamic Cache – Usual scenario where dynamic cache is used is when Lookup table is also a target table. The PowerCenter Server dynamically inserts or updates data in Lookup cache

 Shared Cache – Lookup cache can be shared between multiple transformations. Unnamed cache can be shared between transformations of single mapping. Named cache can be shared between transformations of single mapping or across mappings

Lookup - Cache


Unconnected Lookup
 An Unconnected Lookup transformation receives input values from the result of :LKP expression in another transformation  Unconnected Lookup transformation can be called more than once in a mapping  Unconnected Lookup transformation can return only single column as against connected lookup. This single column is designated as Return Port

 Unconnected Lookup does not support user-defined default value for return port. By default, return value is NULL
 Unconnected Lookup cannot have dynamic cache  Unconnected Lookup can be used in scenarios where  Slowly changing dimension tables are to be updated  Lookup is called multiple times  Conditional Lookup

Dynamic Lookup Cache Advantages
 When the target table is also the Lookup table, cache is changed dynamically as the target load rows are processed in the mapping  New rows to be inserted into the target or for update to the target will affect the dynamic Lookup cache as they are processed  Subsequent rows will know the handling of previous rows  Dynamic Lookup cache and target load rows remain synchronized throughout the Session run

Update Dynamic Lookup Cache
 NewLookupRow port values
• 0 – static lookup, cache is not changed • 1 – insert row to Lookup cache • 2 – update row in Lookup cache

 Does NOT change row type

 Use the Update Strategy transformation before or after Lookup, to flag rows for insert or update to the target
 Ignore NULL Property
• Per port • Ignore NULL values from input row and update the cache only with non-NULL values from input

Example: Dynamic Lookup Configuration
Router Group Filter Condition should be: NewLookupRow = 1

This allows isolation of insert rows from update rows

Dynamic Lookup Cache – Common Errors
"Lookup field has no associated port" when validating dynamic Lookup Transformation Sol: This message occurs when the “Dynamic Lookup Cache” lookup property is selected and one or more lookup/output ports have not been associated with an input port. To resolve this, go to the “Ports” tab and choose an Associated Port for each output/lookup port. Additional Information When a dynamic lookup cache is used, each lookup/output port must be associated with an appropriate input port or a sequence id. The session inserts or updates rows in the lookup cache based on data in associated ports. When a sequence-id is selected in Associated Port column, Informatica Server generates a key for new records inserted in lookup cache.


Dynamic Lookup Cache – Common Errors
"CMN_1650 A duplicate row was inserted into a dynamic lookup cache" running session Sol: This error occurs when the table on which the lookup is built has duplicate rows. Since a dynamic cached lookup cannot be created with duplicate rows, the session fails with this error.

Additional Information Following options are available to work around this:  Make sure there are no duplicate rows in the table before starting the session  Use a static cache instead of dynamic cache  Do a SELECT DISTINCT in the lookup cache SQL


Union Transformation - Basics
          Union Transformation is a multiple input group transformation that merges data from multiple pipelines or pipeline branches into a single pipeline branch Union transformation is an active transformation and is always connected. This transformation merges data similar to the Union All SQL statement, i.e. the transformation does not remove duplicates. Update Strategy or Sequence Generator transformations cannot be used upstream from a Union transformation. Union transformation is an example of Custom transformation. Editable tabs – Transformation, Properties, Groups, Group Ports Non-editable tabs – Ports, Initialization Properties, Metadata Extensions, Port Attribute Definitions Designer creates one Output group by default. You cannot edit or delete the output group Input ports can be created by copying ports from upstream transformations. Designer uses the port names specified on the Group Ports tab for each input and output ports, and it appends a number to make each port name in the transformation unique.


Union Transformation – Common Errors
New order of group ports in a Union Transformation is not reflected in the links Problem Description: After re-ordering the group ports in a Union Transformation the new order is not reflected in the links from the preceding transformation. Example: Suppose there are two ports A and B in a Source Qualifiers and UA and UB in a Union Transformation with port A connected to port UA and port B connected to port UB. Changing the order of the ports UA and UB in the Union Transformation links A to UB and B to UA.

Solution: This is a known issue (CR 83107) with PowerCenter 7.1.x.
Workaround: Delete the incorrect links and create new links.

Union Transformation – Common Errors
Union Transformation and XML Transformations are grayed out in the Designer

Problem Description: While editing a mapping, the following transformations are grayed out and cannot be added to a mapping or created in Transformation Developer Union, Mid-Stream XML Generator, Midstream XML Parser
Solution: The transformations are grayed out because they need to be registered in the PowerCenter repository. These transformation types were introduced in PowerCenter 7.x and are treated as "native" plug-ins to the PowerCenter Repository. Like any plugin, they need to be registered in the repository. In normal operations when the repository is created, they should be registered automatically. If the registration did not occur, these transformations are not available in the Designer. Contd…

Union Transformation – Common Errors
Resolution: To resolve this issue, you need to manually register these plug-ins. You can do so using pmrepagent commands from the command line. To register the plug-in, you need to locate the XML file for the plug-in. The XML file for a native plug-in is located in PowerCenter Repository Server native subdirectory. First navigate to the directory where the pmrepserver binary is installed. Then, change the command argument for database type to the database used for your repository. The following example shows how to register the Union transformation on Microsoft SQL Server and Oracle: On Windows, enter the following commands:
pmrepagent registerplugin -r RepositoryName -n Administrator -x Administrator -t "Microsoft SQL Server" -u DatabaseUserName -p DatabasePassword -c DatabaseConnectString -i native\pmuniontransform.xml

On UNIX, enter the following commands: pmrepagent registerplugin -r RepositoryName -n Administrator -x Administrator -t "Oracle" -u DatabaseUserName -p DatabasePassword -c DatabaseConnectString -i native/pmuniontransform.xml


Union Transformation


Datatype Sizes for calculating Cache
Datatype Binary Date/Time Decimal, high precision off (all precision) Decimal, high precision on (precision <=18) Decimal, high precision on (precision >18, <=28) Decimal, high precision on (precision >28) Decimal, high precision on (negative scale) Aggregator, Rank precision + 2 18 10 18 22 10 10 Joiner, Lookup

precision + 8 Round to nearest multiple of 8
24 16 24 32 16 16

Real Integer Small integer

10 6 6

16 16 16 Unicode mode: 2*(precision + 5) ASCII mode: precision + 9

NString, NText, String, Text

Unicode mode: 2*(precision + 2) ASCII mode: precision + 3


Datatype Sizes for calculating Sorter Cache


Cache Sizes
Transformation Name AGGREGATOR Min Index Cache Size Number of Groups * [(Summation( Column size) + 17] 200 * [(Summation of Column size) + 16] Number of input rows * [(Summation of Column size) + 16] (Sum of master column sizes in join condition + 16) X number rows in master table Number OF Groups * [(Sum of Column size) + 17] (Sum of master column sizes NOT in join condition but on output ports + 8) * number of rows in master table Number of Groups * [(number of ranks * ((Sum of Column size) + 10)) + 20] Max Index Cache Size Double Min Index Cache size Number of rows in Lookup table * [(Summation of Column size) + 16] * 2 Data Cache Size Number of Groups * [(Summation( Column size) + 7] Number of rows in Lookup table * [(Summation of Column size) + 8]






Informatica Performance Tuning


This section will cover  What is Performance Tuning  Measuring Performance

   

Bottlenecks Mapping Optimization Session Task Optimization Partitioning (Lab 9)


Performance Tuning
 There are two general areas to optimize and tune  External components to Informatica (OS, memory, etc.)  Internal to Informatica (tasks, mapping, workflows, etc.)  Getting data through the Informatica engine  This involves optimizing at the task and mapping level  This involves optimizing the system to make sure Informatica runs well  Getting data into and out of the Informatica engine  This usually involves optimizing non-Informatica components  The engine can‟t run faster than the source or target


Measuring Performance
 For the purpose of identifying bottlenecks we will use:
 “Wall Clock” time as a relative measurement time

 Number of rows loaded over the period of time (rows per


 Rows per second (rows/sec) will allow performance measurement of a session over a period of time and with changes in our environment.  Rows per sec can have a very wide range depending on the size of the row (number of bytes), the type of source/target (flat file or relational) and underlying hardware.

Identifying Bottlenecks
 Reader Processes  Configure session to read from Flat File target instead of relational target  Measure the performance  Writer Processes  Configure session to read from Flat File target instead of relational target  Measure the performance  Mapping  Generally if the bottleneck is not with the reader or writer process then the next step is to review the mapping  Mapping bottlenecks can be created by improper configuration of aggregator, joiner, sorter, rank and lookup caches

Mapping Optimizing
 Single-Pass Read  Use a single SQL when reading multiple tables from the same database.  Data type conversions are expensive  Watch out for hidden port to port data type conversions  Over use of string and character conversion functions  Use filters early and often  Filters can be used as SQL overrides (Source Qualifies, Lookups) and as transformations to reduce the amount of data processed

 Simplify expressions  Factor out common logic  Use variables to reduce the number of time a function is used

Mapping Optimizing
 Use operators instead of functions when possible
 The concatenation operator („||‟) is faster than the CONCAT


 Simplify nested IIFs when possible  Use proper cache sizing for Aggregators, Rank, Sorter, Joiner and Lookup transformations
 Incorrect cache sizing creates additional disk swaps that

can have a large performance degradation  Use the performance counters to verify correct sizing


Session Task Optimizing
 Run partitioned sessions
 Improved performance  Better utilization of CPU, I/O and data source/target


 Use “Incremental Aggregation” when possible
 Good for “rolling average” type aggregation

 Reduce transformation errors
 Put logic in place to reduce “bad data” such as nulls in a


 Reduce the level of tracing
Tip: See the “Velocity Methodology” Document for further information


Partitioned Extraction and Load
 Key Range  Round Robin

 Hash Auto Keys
 Hash User Keys  Pass Through


Partitioned Extraction and Load
 Key Range Partition
 Data is distributed between partitions according to

pre-defined range values for keys  Available in PowerCenter 5, but only for Source Qualifier.  Key Range partitioning can be applied to other transformations in 6.x and up  Common use for this new functionality:
 Apply to Target Definition to align output with physical partition scheme of target table.  Apply to Target Definition to write all data to a single file to stream into a database bulk loader that does not support concurrent loads (e.g. Teradata, DB2)

Partitioned Extraction and Load
 Key Range Partition (Continued)
 You can select input or input/output ports for the keys,

not variable ports and output only ports.

 Remember, the partition occurs BEFORE the transformation;

hence variable and output only are not allowed because they have not yet been evaluated

    

You can select multiple keys to form a Composite Key Range specification is: Start Range and End Range You can specify an Open Range also NULL values will go to the First Partition All unmatched rows will also go into First Partition, user will see the following warning message once in the log file:

TRANSF_1_1_2_1> TT_11083 WARNING! A row did not match any of the key ranges specified at transformation [EXPTRANS]. This row and all subsequent unmatched rows will be sent to the first target partition.

Partitioned Extraction and Load
 Round Robin Partitioning
 The Informatica Server evenly distributes the data to

each partition.  The user need not specify anything because key values are not interrogated  Common use:
 Apply to a flat file source qualifier when dealing with unequal

input file sizes – Use “user hash” when there are downstream lookups/joiners  Trick: All but one of the input files can be empty… you no longer have to physically partition input files – Note: There are performance implications with doing this. Sometimes it‟s better, sometimes it‟s worse.

Partitioned Extraction and Load
 Hash Partitioning
 Data is distributed between partitions according to a “hash”

function applied to the key values  PowerCenter supports Auto Hash Partitioning automatically for aggregator and rank transformations  Goal: Evenly distribute data, but make sure that like key values are always processed by the same partition  A “hash function” is applied to a set of ports.
 Hash function returns a value between 1 and the number of partitions.  A “good” hash function provides a uniform distribution of return values

– Not all 1s or all 2s, but an even mix.
 Based on this function‟s return value, the row is routed to

the corresponding partition (e.g. if 1, send to partition 1, if 2, send to partition 2, etc.)

Partitioned Extraction and Load
 Hash Auto Key
 No need to specify the keys to hash on. Automatically uses all

key ports (ex, Group By key or Sort key) as a composite key  Only valid for Unsorted Aggregation, Rank and Sorter  The default partition type for unsorted aggregator and rank  NULL values will get converted to zero for hashing.

 Hash Key
 Just like Auto Hash Key, but the user explicitly specifies the ports

to be used as the key.

 Only input and input/output ports allowed

 Common use:  When dealing with input files of unequal sizes, hash partition (vs. round-robin) data into downstream lookups and joiners to improve “locality of reference” of caches (hash on ports used in the lookup/join condition)  “Override” the auto hash for performance reasons

– Hashing is faster on numeric values vs. strings


Partitioned Extraction and Load
 Pass Through Partitioning
 Data is passed through to the next stage within the current

partition  Since data is not distributed, the user need not specify anything  Common use:
 Create additional stages (processing threads) within pipeline

to improve performance


Partitioned Extraction and Load
 Partition tab appears in Session Task from within the Workflow Manager


Partitioned Extraction and Load
 By default, session tasks have the following partition points and partition schemes:  Relational Source, Target (Pass Through)  File Source, Target (Pass Through)  Unsorted Aggregator, Rank (Auto Hash)  NOTE: You cannot delete the default partition points for sources and targets.
 You can NOT run debug session with number of partition >1


Partitioning Do’s
 Cache as much as possible in memory  Spread cache files across multiple physical devices both within and across partitions  Unless the directory is hosted on some kind of disk array, configure disk based caches to use as many disk devices as possible  “Round Robin” or “Hash” partition a single input file until you determine this is bottleneck  “Range Partition” to align source/target with physical partitioning scheme of source/target tables  “Pass through” partition to apply more CPU resources to a pipeline (when TX is bottleneck)


Partitioning Don’ts
 Don‟t add partition points if the session is already source or target constrained  Tune the source or target to eliminate the bottleneck  Don‟t add partition points if the CPUs are already maxed out (%idle < ~5%)  Eliminate unnecessary processing and/or buy more CPUs  Don‟t add partition points if system performance monitor shows regular “page out” activity  Eliminate unnecessary processing and/or buy more memory  Don‟t add multiple partitions until you‟ve tested and tuned a single partition  You‟ll be glad you did


Default Partition Points
Default partitioning points





Default Partition Points
Default Partition Point Default Partition Type Description

Source Qualifier or Normalizer Transformation Rank and unsorted Aggregator Transformation Target Instances


Controls how the Server reads data from the source and passes data into the source qualifier Ensures that the Server group rows before it sends them to the transformation Control how the instances distribute data to the targets

Hash auto-keys



Informatica Templates


This section will cover  Template for generating Mapping Specifications  Template for generating sessions/workflow specifications  Unit Test Plan for Informatica Mapping  Standards and Best Practices


Mapping Specifications


Session/Workflow Specifications



Sign up to vote on this title
UsefulNot useful