You are on page 1of 38

Using Parameters, Variables and Parameter Files

Using Parameters, Variables and Parameter Files


Challenge
Understanding how parameters, variables, and parameter files work and using them for maximum efficiency.

Description
Prior to the release of PowerCenter 5, the only variables inherent to the product were defined to specific
transformations and to those server variables that were global in nature. Transformation variables were defined as
variable ports in a transformation and could only be used in that specific transformation object (e.g., Expression,
Aggregator, and Rank transformations). Similarly, global parameters defined within Server Manager would affect
the subdirectories for source files, target files, log files, and so forth.
More current versions of PowerCenter made variables and parameters available across the entire mapping rather
than for a specific transformation object. In addition, they provide built-in parameters for use within Workflow
Manager. Using parameter files, these values can change from session-run to session-run. With the addition of
workflows, parameters can now be passed to every session contained in the workflow, providing more flexibility
and reducing parameter file maintenance. Other important functionality that has been added in recent releases is
the ability to dynamically create parameter files that can be used in the next session in a workflow or in other
workflows.

Parameters and Variables


Use a parameter file to define the values for parameters and variables used in a workflow, worklet, mapping, or
session. A parameter file can be created using a text editor such as WordPad or Notepad. List the parameters or
variables and their values in the parameter file. Parameter files can contain the following types of parameters and
variables:
Workflow variables
Worklet variables
Session parameters
Mapping parameters and variables
When using parameters or variables in a workflow, worklet, mapping, or session, the Integration Service checks
the parameter file to determine the start value of the parameter or variable. Use a parameter file to initialize
workflow variables, worklet variables, mapping parameters, and mapping variables. If not defining start values for
these parameters and variables, the Integration Service checks for the start value of the parameter or variable in
other places.
Session parameters must be defined in a parameter file. Because session parameters do not have default values,
if the Integration Service cannot locate the value of a session parameter in the parameter file, it fails to initialize the
session. To include parameter or variable information for more than one workflow, worklet, or session in a single
parameter file, create separate sections for each object within the parameter file.
Also, create multiple parameter files for a single workflow, worklet, or session and change the file that these tasks

use, as necessary. To specify the parameter file that the Integration Service uses with a workflow, worklet, or
session, do either of the following:
Enter the parameter file name and directory in the workflow, worklet, or session properties.
Start the workflow, worklet, or session using pmcmd and enter the parameter filename and directory in the
command line.
If entering a parameter file name and directory in the workflow, worklet, or session properties and in the pmcmd
command line, the Integration Service uses the information entered in the pmcmd command line.

Parameter File Format


When entering values in a parameter file, precede the entries with a heading that identifies the workflow, worklet or
session whose parameters and variables are to be assigned. Assign individual parameters and variables directly
below this heading, entering each parameter or variable on a new line. List parameters and variables in any order
for each task.
The following heading formats can be defined:
Workflow variables - [folder name.WF:workflow name]
Worklet variables -[folder name.WF:workflow name.WT:worklet name]
Worklet variables in nested worklets - [folder name.WF:workflow name.WT:worklet name.WT:worklet
name...]
Session parameters, plus mapping parameters and variables - [folder name.WF:workflow
name.ST:session name] or [folder name.session name] or [session name]
Below each heading, define parameter and variable values as follows:
parameter name=value
parameter2 name=value
variable name=value
variable2 name=value
For example, a session in the production folder, s_MonthlyCalculations, uses a string mapping parameter, $$State,
that needs to be set to MA, and a datetime mapping variable, $$Time. $$Time already has an initial value of
9/30/2000 00:00:00 saved in the repository, but this value needs to be overridden to 10/1/2000 00:00:00. The
session also uses session parameters to connect to source files and target databases, as well as to write session
log to the appropriate session log file. The following table shows the parameters and variables that can be defined
in the parameter file:
Parameter and
Variable Type

Parameter and
Variable Name

Desired Definition

String Mapping
Parameter

$$State

MA

Datetime Mapping
Variable

$$Time

10/1/2000 00:00:00

Source File (Session


Parameter)

$InputFile1

Sales.txt

Database Connection
(Session Parameter)

$DBConnection_Target

Sales (database
connection)

Session Log File


(Session Parameter)

$PMSessionLogFile

d:/session
logs/firstrun.txt

The parameter file for the session includes the folder and session name, as well as each parameter and variable:
[Production.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales.txt
$DBConnection_target=sales
$PMSessionLogFile=D:/session logs/firstrun.txt
The next time the session runs, edit the parameter file to change the state to MD and delete the $$Time variable.
This allows the Integration Service to use the value for the variable that was set in the previous session run

Mapping Variables
Declare mapping variables in PowerCenter Designer using the menu option Mappings -> Parameters and
Variables (See the first figure, below). After selecting mapping variables, use the pop-up window to create a
variable by specifying its name, data type, initial value, aggregation type, precision, and scale. This is similar to
creating a port in most transformations (See the second figure, below).

Variables, by definition, are objects that can change value dynamically. PowerCenter has four functions to affect
change to mapping variables:
SetVariable
SetMaxVariable
SetMinVariable
SetCountVariable
A mapping variable can store the last value from a session run in the repository to be used as the starting value for
the next session run.
Name. The name of the variable should be descriptive and be preceded by $$ (so that it is easily identifiable
as a variable). A typical variable name is: $$Procedure_Start_Date.
Aggregation type. This entry creates specific functionality for the variable and determines how it stores
data. For example, with an aggregation type of Max, the value stored in the repository at the end of each
session run would be the maximum value across ALL records until the value is deleted.
Initial value. This value is used during the first session run when there is no corresponding and overriding
parameter file. This value is also used if the stored repository value is deleted. If no initial value is identified,
then a data-type specific default value is used.
Variable values are not stored in the repository when the session:
Fails to complete.
Is configured for a test load.
Is a debug session.
Runs in debug mode and is configured to discard session output.

Order of Evaluation

The start value is the value of the variable at the start of the session. The start value can be a value defined in the
parameter file for the variable, a value saved in the repository from the previous run of the session, a user-defined
initial value for the variable, or the default value based on the variable data type. The Integration Service looks for
the start value in the following order:
1.
2.
3.
4.

Value in session parameter file


Value saved in the repository
Initial value
Default value

Mapping Parameters and Variables


Since parameter values do not change over the course of the session run, the value used is based on:
Value in session parameter file
Initial value
Default value
Once defined, mapping parameters and variables can be used in the Expression Editor section of the following
transformations:
Expression
Filter
Router
Update Strategy
Aggregator
Mapping parameters and variables also can be used within the Source Qualifier in the SQL query, user-defined
join, and source filter sections, as well as in a SQL override in the lookup transformation.

Guidelines for Creating Parameter Files


Use the following guidelines when creating parameter files:
Enter folder names for non-unique session names. When a session name exists more than once in a
repository, enter the folder name to indicate the location of the session.
Create one or more parameter files. Assign parameter files to workflows, worklets, and sessions
individually. Specify the same parameter file for all of these tasks or create several parameter files.
If including parameter and variable information for more than one session in the file, create a new
section for each session. The folder name is optional.

[folder_name.session_name]
parameter_name=value
variable_name=value
mapplet_name.parameter_name=value
[folder2_name.session_name]
parameter_name=value

variable_name=value
mapplet_name.parameter_name=value
Specify headings in any order. Place headings in any order in the parameter file. However, if defining the
same parameter or variable more than once in the file, the Integration Service assigns the parameter or
variable value using the first instance of the parameter or variable.
Specify parameters and variables in any order. Below each heading, the parameters and variables can
be specified in any order.
When defining parameter values, do not use unnecessary line breaks or spaces. The Integration
Service may interpret additional spaces as part of the value.
List all necessary mapping parameters and variables. Values entered for mapping parameters and
variables become the start value for parameters and variables in a mapping. Mapping parameter and
variable names are not case sensitive.
List all session parameters. Session parameters do not have default values. An undefined session
parameter can cause the session to fail. Session parameter names are not case sensitive.
Use correct date formats for datetime values. When entering datetime values, use the following date
formats:

MM/DD/RR
MM/DD/RR HH24:MI:SS
MM/DD/YYYY
MM/DD/YYYY HH24:MI:SS
Do not enclose parameters or variables in quotes. The Integration Service interprets everything after the
equal sign as part of the value.
Do enclose parameters in single quotes. In a Source Qualifier SQL Override use single quotes if the
parameter represents a string or date/time value to be used in the SQL Override.
Precede parameters and variables created in mapplets with the mapplet name as follows:

mapplet_name.parameter_name=value
mapplet2_name.variable_name=value

Sample: Parameter Files and Session Parameters


Parameter files, along with session parameters, allow you to change certain values between sessions. A
commonly-used feature is the ability to create user-defined database connection session parameters to reuse
sessions for different relational sources or targets. Use session parameters in the session properties, and then
define the parameters in a parameter file. To do this, name all database connection session parameters with the
prefix $DBConnection, followed by any alphanumeric and underscore characters. Session parameters and
parameter files help reduce the overhead of creating multiple mappings when only certain attributes of a mapping
need to be changed.

Using Parameters in Source Qualifiers


Another commonly used feature is the ability to create parameters in the source qualifiers, which allows you to
reuse the same mapping, with different sessions, to extract specified data from the parameter files the session
references. Moreover, there may be a time when it is necessary to create a mapping that will create a parameter
file and the second mapping to use that parameter file created from the first mapping. The second mapping pulls
the data using a parameter in the Source Qualifier transformation, which reads the parameter from the parameter
file created in the first mapping. In the first case, the idea is to build a mapping that creates the flat file, which is a
parameter file for another session to use.

Sample: Variables and Parameters in an Incremental Strategy


Variables and parameters can enhance incremental strategies. The following example uses a mapping variable, an
expression transformation object, and a parameter file for restarting.
Scenario
Company X wants to start with an initial load of all data, but wants subsequent process runs to select only new
information. The environment data has an inherent Post_Date that is defined within a column named Date_Entered
that can be used. The process will run once every twenty-four hours.
Sample Solution
Create a mapping with source and target objects. From the menu create a new mapping variable named
$$Post_Date with the following attributes:
TYPE Variable
DATATYPE Date/Time
AGGREGATION TYPE MAX
INITIAL VALUE 01/01/1900
Note that there is no need to encapsulate the INITIAL VALUE with quotation marks. However, if this value is used
within the Source Qualifier SQL, it may be necessary to use native RDBMS functions to convert (e.g., TO DATE(-,--)). Within the Source Qualifier Transformation, use the following in the Source_Filter Attribute: DATE_ENTERED
> to_Date(' $$Post_Date','MM/DD/YYYY HH24:MI:SS') [please be aware that this sample refers to Oracle as the
source RDBMS]. Also note that the initial value 01/01/1900 will be expanded by the Integration Service to
01/01/1900 00:00:00, hence the need to convert the parameter to a datetime.
The next step is to forward $$Post_Date and Date_Entered to an Expression transformation. This is where the
function for setting the variable will reside. An output port named Post_Date is created with a data type of
date/time. In the expression code section, place the following function:

SETMAXVARIABLE($$Post_Date,DATE_ENTERED)
The function evaluates each value for DATE_ENTERED and updates the variable with the Max value to be passed
forward. For example:
DATE_ENTERED

ResultantPOST_DATE

9/1/2000

9/1/2000

10/30/2001

10/30/2001

9/2/2000

10/30/2001

Consider the following with regard to the functionality:


1. In order for the function to assign a value, and ultimately store it in the repository, the port must be
connected to a downstream object. It need not go to the target, but it must go to another Expression
Transformation. The reason is that the memory will not be instantiated unless it is used in a downstream
transformation object.
2. In order for the function to work correctly, the rows have to be marked for insert. If the mapping is an updateonly mapping (i.e., Treat Rows As is set to Update in the session properties) the function will not work. In
this case, make the session Data Driven and add an Update Strategy after the transformation containing the
SETMAXVARIABLE function, but before the Target.
3. If the intent is to store the original Date_Entered per row and not the evaluated date value, then add an
ORDER BY clause to the Source Qualifier. This way, the dates are processed and set in order and data is
preserved.

The first time this mapping is run, the SQL will select from the source where Date_Entered is > 01/01/1900
providing an initial load. As data flows through the mapping, the variable gets updated to the Max Date_Entered it
encounters. Upon successful completion of the session, the variable is updated in the repository for use in the next
session run. To view the current value for a particular variable associated with the session, right-click on the
session in the Workflow Monitor and choose View Persistent Values.
The following graphic shows that after the initial run, the Max Date_Entered was 02/03/1998. The next time this
session is run, based on the variable in the Source Qualifier Filter, only sources where Date_Entered > 02/03/1998
will be processed.

Resetting or Overriding Persistent Values

To reset the persistent value to the initial value declared in the mapping, view the persistent value from Workflow
Manager (see graphic above) and press Delete Values. This deletes the stored value from the repository, causing
the Order of Evaluation to use the Initial Value declared from the mapping.
If a session run is needed for a specific date, use a parameter file. There are two basic ways to accomplish this:
Create a generic parameter file, place it on the server, and point all sessions to that parameter file. A
session may (or may not) have a variable, and the parameter file need not have variables and parameters
defined for every session using the parameter file. To override the variable, either change, uncomment, or
delete the variable in the parameter file.
Run pmcmd for that session, but declare the specific parameter file within the pmcmd command.

Configuring the Parameter File Location


Specify the parameter filename and directory in the workflow or session properties. To enter a parameter file in the
workflow or session properties:
Select either the Workflow or Session, choose, Edit, and click the Properties tab.
Enter the parameter directory and name in the Parameter Filename field.
Enter either a direct path or a server variable directory. Use the appropriate delimiter for the Integration
Service operating system.
The following graphic shows the parameter filename and location specified in the session task.

The next graphic shows the parameter filename and location specified in the Workflow.

In this example, after the initial session is run, the parameter file contents may look like:

[Test.s_Incremental]
;$$Post_Date=
By using the semicolon, the variable override is ignored and the Initial Value or Stored Value is used. If, in the
subsequent run, the data processing date needs to be set to a specific date (for example: 04/21/2001), then a
simple Perl script or manual change can update the parameter file to:

[Test.s_Incremental]
$$Post_Date=04/21/2001
Upon running the sessions, the order of evaluation looks to the parameter file first, sees a valid variable and value
and uses that value for the session run. After successful completion, run another script to reset the parameter file.

Sample: Using Session and Mapping Parameters in Multiple Database Environments


Reusable mappings that can source a common table definition across multiple databases, regardless of differing
environmental definitions (e.g., instances, schemas, user/logins), are required in a multiple database environment.
Scenario
Company X maintains five Oracle database instances. All instances have a common table definition for sales
orders, but each instance has a unique instance name, schema, and login.
DBInstance

Schema

Table

User

Password

ORC1

aardso

orders

Sam

max

ORC99

environ

orders

Help

me

HALC

hitme

order_done

Hi

Lois

UGLY

snakepit

orders

Punch

Judy

GORF

gmer

orders

Brer

Rabbit

Each sales order table has a different name, but the same definition:
ORDER_ID

NUMBER (28)

NOT NULL,

DATE_ENTERED

DATE

NOT NULL,

DATE_PROMISED

DATE

NOT NULL,

DATE_SHIPPED

DATE

NOT NULL,

EMPLOYEE_ID

NUMBER (28)

NOT NULL,

CUSTOMER_ID

NUMBER (28)

NOT NULL,

SALES_TAX_RATE

NUMBER (5,4)

NOT NULL,

STORE_ID

NUMBER (28)

NOT NULL

Sample Solution
Using Workflow Manager, create multiple relational connections. In this example, the strings are named according
to the DB Instance name. Using Designer, create the mapping that sources the commonly defined table. Then
create a Mapping Parameter named $$Source_Schema_Table with the following attributes:

Note that the parameter attributes vary based on the specific environment. Also, the initial value is not required
since this solution uses parameter files.
Open the Source Qualifier and use the mapping parameter in the SQL Override as shown in the following graphic.

Open the Expression Editor and select Generate SQL. The generated SQL statement shows the columns.
Override the table names in the SQL statement with the mapping parameter.
Using Workflow Manager, create a session based on this mapping. Within the Source Database connection dropdown box, choose the following parameter:

$DBConnection_Source.
Point the target to the corresponding target and finish.
Now create the parameter files. In this example, there are five separate parameter files.

Parmfile1.txt

[Test.s_Incremental_SOURCE_CHANGES]
$$Source_Schema_Table=aardso.orders
$DBConnection_Source= ORC1
Parmfile2.txt

[Test.s_Incremental_SOURCE_CHANGES]

$$Source_Schema_Table=environ.orders
$DBConnection_Source= ORC99
Parmfile3.txt

[Test.s_Incremental_SOURCE_CHANGES]
$$Source_Schema_Table=hitme.order_done
$DBConnection_Source= HALC
Parmfile4.txt

[Test.s_Incremental_SOURCE_CHANGES]
$$Source_Schema_Table=snakepit.orders
$DBConnection_Source= UGLY
Parmfile5.txt

[Test.s_Incremental_SOURCE_CHANGES]
$$Source_Schema_Table= gmer.orders
$DBConnection_Source= GORF
Use pmcmd to run the five sessions in parallel. The syntax for pmcmd for starting sessions with a particular
parameter file is as follows:

pmcmd startworkflow -s serveraddress:portno -u Username -p Password -paramfile parmfilename


s_Incremental
You may also use "-pv pwdvariable" if the named environment variable contains the encrypted form of the actual
password.

Notes on Using Parameter Files with Startworkflow


When starting a workflow, you can optionally enter the directory and name of a parameter file. The PowerCenter
Integration Service runs the workflow using the parameters in the file specified. For UNIX shell users, enclose the
parameter file name in single quotes:

-paramfile '$PMRootDir/myfile.txt'
For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the
name includes spaces, enclose the file name in double quotes:

-paramfile "$PMRootDir\my file.txt"


Note: When writing a pmcmd command that includes a parameter file located on another machine, use the
backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined expands the
server variable.

pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile
'\$PMRootDir/myfile.txt'
In the event that it is necessary to run the same workflow with different parameter files, use the following five
separate commands:

pmcmd startworkflow -u tech_user -p pwd -s 127.0.0.1:4001 -f Test s_Incremental_SOURCE_CHANGES paramfile \$PMRootDir\ParmFiles\Parmfile1.txt 1 1


pmcmd startworkflow -u tech_user -p pwd -s 127.0.0.1:4001 -f Test s_Incremental_SOURCE_CHANGES paramfile \$PMRootDir\ParmFiles\Parmfile2.txt 1 1
pmcmd startworkflow -u tech_user -p pwd -s 127.0.0.1:4001 -f Test s_Incremental_SOURCE_CHANGES paramfile \$PMRootDir\ParmFiles\Parmfile3.txt 1 1
pmcmd startworkflow -u tech_user -p pwd -s 127.0.0.1:4001 -f Test s_Incremental_SOURCE_CHANGES paramfile \$PMRootDir\ParmFiles\Parmfile4.txt 1 1
pmcmd startworkflow -u tech_user -p pwd -s 127.0.0.1:4001 -f Test s_Incremental_SOURCE_CHANGES paramfile \$PMRootDir\ParmFiles\Parmfile5.txt 1 1
Alternatively, run the sessions in sequence with one parameter file. In this case, a pre- or post-session script can
change the parameter file for the next session.

Dynamically creating Parameter Files with a mapping


Using advanced techniques a PowerCenter mapping can be built that produces as a target file a parameter file
(.parm) that can be referenced by other mappings and sessions. When many mappings use the same parameter
file it is desirable to be able to easily re-create the file when mapping parameters are changed or updated. This
also can be beneficial when parameters change from run to run. There are a few different methods of creating a
parameter file with a mapping.
There is a mapping template example on the my.informatica.com that illustrates a method of using a PowerCenter
mapping to source from a process table containing mapping parameters and to create a parameter file. This same
feat can be accomplished also by sourcing a flat file in a parameter file format with code characters in the fields to
be altered.

[folder_name.session_name]
parameter_name= <parameter_code>
variable_name=value

mapplet_name.parameter_name=value
[folder2_name.session_name]
parameter_name= <parameter_code>
variable_name=value
mapplet_name.parameter_name=value
In place of the text <parameter_code> one could place the text filename_<timestamp>.dat. The mapping would
then perform a string replace wherever the text <timestamp> occurred and the output might look like:

Src_File_Name= filename_20080622.dat
This method works well when values change often and parameter groupings utilize different parameter sets. The
overall benefits of using this method are such that if many mappings use the same parameter file, changes can be
made by updating the source table and recreating the file. Using this process is faster than manually updating the
file line by line.

Final Tips for Parameters and Parameter Files


Use a single parameter file to group parameter information for related sessions.
When sessions are likely to use the same database connection or directory, you might want to include them in the
same parameter file. When connections or directories change, you can update information for all sessions by
editing one parameter file. Sometimes you reuse session parameters in a cycle. For example, you might run a
session against a sales database everyday, but run the same session against sales and marketing databases once
a week. You can create separate parameter files for each session run. Instead of changing the parameter file in the
session properties each time you run the weekly session, use pmcmd to specify the parameter file to use when you
start the session.
Use reject file and session log parameters in conjunction with target file or target database connection
parameters.
When you use a target file or target database connection parameter with a session, you can keep track of reject
files by using a reject file parameter. You can also use the session log parameter to write the session log to the
target machine.
Use a resource to verify the session runs on a node that has access to the parameter file.
In the Administration Console, you can define a file resource for each node that has access to the parameter file
and configure the Integration Service to check resources. Then, edit the session that uses the parameter file and
assign the resource. When you run the workflow, the Integration Service runs the session with the required
resource on a node that has the resource available.
Save all parameter files in one of the process variable directories.
If you keep all parameter files in one of the process variable directories, such as $SourceFileDir, use the process
variable in the session property sheet. If you need to move the source and parameter files at a later date, you can
update all sessions by changing the process variable to point to the new directory.

Session and Data Partitioning

Session and Data Partitioning


Challenge
Improving performance by identifying strategies for partitioning relational tables, XML, COBOL and standard flat
files, and by coordinating the interaction between sessions, partitions, and CPUs. These strategies take advantage
of the enhanced partitioning capabilities in PowerCenter.

Description
On hardware systems that are under-utilized, you may be able to improve performance by processing partitioned
data sets in parallel in multiple threads of the same session instance running onthe PowerCenter Server engine.
However, parallel execution may impair performance on over-utilized systems or systems with smaller I/O capacity.
In addition to hardware, consider these other factors when determining if a session is an ideal candidate for
partitioning: source and target database setup, target type, mapping design, and certain assumptions that are
explained in the following paragraphs. Use the Workflow Manager client tool to implement session partitioning.

Assumptions
The following assumptions pertain to the source and target systems of a session that is a candidate for partitioning.
These factors can help to maximize the benefits that can be achieved through partitioning.
Indexing has been implemented on the partition key when using a relational source.
Source files are located on the same physical machine as the PowerCenter Server process when
partitioning flat files, COBOL, and XML, to reduce network overhead and delay.
All possible constraints are dropped or disabled on relational targets.
All possible indexes are dropped or disabled on relational targets.
Table spaces and database partitions are properly managed on the target system.
Target files are written to same physical machine that hosts the PowerCenter process in order to reduce
network overhead and delay.
Oracle External Loaders are utilized whenever possible
First, determine if you should partition your session. Parallel execution benefits systems that have the following
characteristics:
Check idle time and busy percentage for each thread. This gives the high-level information of the bottleneck
point/points. In order to do this, open the session log and look for messages starting with PETL_ under the RUN
INFO FOR TGT LOAD ORDER GROUP section. These PETL messages give the following details against the
reader, transformation, and writer threads:
Total Run Time
Total Idle Time
Busy Percentage
Under-utilized or intermittently-used CPUs. To determine if this is the case, check the CPU usage of your
machine. The column ID displays the percentage utilization of CPU idling during the specified interval without any

I/O wait. If there are CPU cycles available (i.e., twenty percent or more idle time), then this session's performance
may be improved by adding a partition.
Windows 2000/2003 - check the task manager performance tab.
UNIX - type VMSTAT 1 10 on the command line.
Sufficient I/O. To determine the I/O statistics:
Windows 2000/2003 - check the task manager performance tab.
UNIX - type IOSTAT on the command line. The column %IOWAIT displays the percentage of CPU time
spent idling while waiting for I/O requests. The column %idle displays the total percentage of the time that
the CPU spends idling (i.e., the unused capacity of the CPU.)
Sufficient memory. If too much memory is allocated to your session, you will receive a memory allocation error.
Check to see that you're using as much memory as you can. If the session is paging, increase the memory. To
determine if the session is paging:
Windows 2000/2003 - check the task manager performance tab.
UNIX - type VMSTAT 1 10 on the command line. PI displays number of pages swapped in from the page
space during the specified interval. PO displays the number of pages swapped out to the page space during
the specified interval. If these values indicate that paging is occurring, it may be necessary to allocate more
memory, if possible.
If you determine that partitioning is practical, you can begin setting up the partition.

Partition Types
PowerCenter provides increased control of the pipeline threads. Session performance can be improved by adding
partitions at various pipeline partition points. When you configure the partitioning information for a pipeline, you
must specify a partition type. The partition type determines how the PowerCenter Server redistributes data across
partition points. The Workflow Manager allows you to specify the following partition types:

Round-robin Partitioning
The PowerCenter Server distributes data evenly among all partitions. Use round-robin partitioning when you need
to distribute rows evenly and do not need to group data among partitions.
In a pipeline that reads data from file sources of different sizes, use round-robin partitioning. For example, consider
a session based on a mapping that reads data from three flat files of different sizes.
Source file 1: 100,000 rows
Source file 2: 5,000 rows
Source file 3: 20,000 rows
In this scenario, the recommended best practice is to set a partition point after the Source Qualifier and set the
partition type to round-robin. The PowerCenter Server distributes the data so that each partition processes
approximately one third of the data.

Hash Partitioning
The PowerCenter Server applies a hash function to a partition key to group data among partitions.
Use hash partitioning where you want to ensure that the PowerCenter Server processes groups of rows with the
same partition key in the same partition. For example, in a scenario where you need to sort items by item ID, but
do not know the number of items that have a particular ID number. If you select hash auto-keys, the PowerCenter

Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of
ports to form the partition key.
An example of this type of partitioning is when you are using Aggregators and need to ensure that groups of data
based on a primary key are processed in the same partition.

Key Range Partitioning


With this type of partitioning, you specify one or more ports to form a compound partition key for a source or target.
The PowerCenter Server then passes data to each partition depending on the ranges you specify for each port.
Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. Refer to
Workflow Administration Guide for further directions on setting up Key range partitions.
For example, with key range partitioning set at End range = 2020, the PowerCenter Server passes in data where
values are less than 2020. Similarly, for Start range = 2020, the PowerCenter Server passes in data where values
are equal to greater than 2020. Null values or values that may not fall in either partition are passed through the first
partition.

Pass-through Partitioning
In this type of partitioning, the PowerCenter Server passes all rows at one partition point to the next partition point
without redistributing them.
Use pass-through partitioning where you want to create an additional pipeline stage to improve performance, but
do not want to (or cannot) change the distribution of data across partitions. The Data Transformation Manager
spawns a master thread on each session run, which in turn creates three threads (reader, transformation, and
writer threads) by default. Each of these threads can, at the most, process one data set at a time and hence, three
data sets simultaneously. If there are complex transformations in the mapping, the transformation thread may take
a longer time than the other threads, which can slow data throughput.
It is advisable to define partition points at these transformations. This creates another pipeline stage and reduces
the overhead of a single transformation thread.
When you have considered all of these factors and selected a partitioning strategy, you can begin the iterative
process of adding partitions. Continue adding partitions to the session until you meet the desired performance
threshold or observe degradation in performance.

Tips for Efficient Session and Data Partitioning


Add one partition at a time. To best monitor performance, add one partition at a time, and note your
session settings before adding additional partitions. Refer to Workflow Administrator Guide, for more
information on Restrictions on the Number of Partitions.
Set DTM buffer memory. For a session with n partitions, set this value to at least n times the original value
for the non-partitioned session.
Set cached values for sequence generator. For a session with n partitions, there is generally no need to
use the Number of Cached Values property of the sequence generator. If you must set this value to a value
greater than zero, make sure it is at least n times the original value for the non-partitioned session.
Partition the source data evenly. The source data should be partitioned into equal sized chunks for each
partition.
Partition tables. A notable increase in performance can also be realized when the actual source and target
tables are partitioned. Work with the DBA to discuss the partitioning of source and target tables, and the
setup of tablespaces.
Consider using external loader. As with any session, using an external loader may increase session

performance. You can only use Oracle external loaders for partitioning. Refer to the Session and Server
Guide for more information on using and setting up the Oracle external loader for partitioning.
Write throughput. Check the session statistics to see if you have increased the write throughput.
Paging. Check to see if the session is now causing the system to page. When you partition a session and
there are cached lookups, you must make sure that DTM memory is increased to handle the lookup caches.
When you partition a source that uses a static lookup cache, the PowerCenter Server creates one memory
cache for each partition and one disk cache for each transformation. Thus, memory requirements grow for
each partition. If the memory is not bumped up, the system may start paging to disk, causing degradation in
performance.
When you finish partitioning, monitor the session to see if the partition is degrading or improving session
performance. If the session performance is improved and the session meets your requirements, add another
partition

Session on Grid and Partitioning Across Nodes


Session on Grid (provides the ability to run a session on multi-node integration services. This is most suitable for
large-size sessions. For small and medium size sessions, it is more practical to distribute whole sessions to
different nodes using Workflow on Grid. Session on Grid leverages existing partitions of a session b executing
threads in multiple DTMs. Log service can be used to get the cumulative log. See PowerCenter Enterprise Grid
Option for detailed configuration information.

Dynamic Partitioning
Dynamic partitioning is also called parameterized partitioning because a single parameter can determine the
number of partitions. With the Session on Grid option, more partitions can be added when more resources are
available. Also the number of partitions in a session can be tied to partitions in the database to facilitate
maintenance of PowerCenter partitioning to leverage database partitioning.

Real-Time Integration with PowerCenter

Real-Time Integration with PowerCenter


Challenge
Configure PowerCenter to work with various PowerExchange data access products to process real-time data. This
Best Practice discusses guidelines for establishing a connection with PowerCenter and setting up a real-time
session to work with PowerCenter.

Description
PowerCenter with real-time option can be used to process data from real-time data sources. PowerCenter supports
the following types of real-time data:
Messages and message queues. PowerCenter with the real-time option can be used to integrate thirdparty messaging applications using a specific PowerExchange data access product. Each PowerExchange
product supports a specific industry-standard messaging application, such as WebSphere MQ, JMS,
MSMQ, SAP NetWeaver, TIBCO, and webMethods. You can read from messages and message queues
and write to messages, messaging applications, and message queues. WebSphere MQ uses a queue to
store and exchange data. Other applications, such as TIBCO and JMS, use a publish/subscribe model. In
this case, the message exchange is identified using a topic.
Web service messages. PowerCenter can receive a web service message from a web service client
through the Web Services Hub, transform the data, and load the data to a target or send a message back to
a web service client. A web service message is a SOAP request from a web service client or a SOAP
response from the Web Services Hub. The Integration Service processes real-time data from a web service
client by receiving a message request through the Web Services Hub and processing the request. The
Integration Service can send a reply back to the web service client through the Web Services Hub or write
the data to a target.
Changed source data. PowerCenter can extract changed data in real time from a source table using the
PowerExchange Listener and write data to a target. Real-time sources supported by PowerExchange are
ADABAS, DATACOM, DB2/390, DB2/400, DB2/UDB, IDMS, IMS, MS SQL Server, Oracle and VSAM.

Connection Setup
PowerCenter uses some attribute values in order to correctly connect and identify the third-party messaging
application and message itself. Each PowerExchange product supplies its own connection attributes that need to
be configured properly before running a real-time session.

Setting Up Real-Time Session in PowerCenter


The PowerCenter real-time option uses a zero latency engine to process data from the messaging system.
Depending on the messaging systems and the application that sends and receives messages, there may be a
period when there are many messages and, conversely, there may be a period when there are no messages.
PowerCenter uses the attribute Flush Latency to determine how often the messages are being flushed to the
target. PowerCenter also provides various attributes to control when the session ends.
The following reader attributes determine when a PowerCenter session should end:

Message Count - Controls the number of messages the PowerCenter Server reads from the source before
the session stops reading from the source.
Idle Time - Indicates how long the PowerCenter Server waits when no messages arrive before it stops
reading from the source.
Time Slice Mode - Indicates a specific range of time that the server read messages from the source. Only
PowerExchange for WebSphere MQ uses this option.
Reader Time Limit - Indicates the number of seconds the PowerCenter Server spends reading messages
from the source.
The specific filter conditions and options available to you depend on which Real-Time source is being used. For
example -Attributes for PowerExchange for DB2 for i5/OS:

Set the attributes that control how the reader ends. One or more attributes can be used to control the end of
session.
For example, set the Reader Time Limit attribute to 3600. The reader will end after 3600 seconds. The idle time
limit is set to 500 seconds. The reader will end if it doesnt process any changes for 500 seconds (i.e., it remains
idle for 500 seconds).
If more than one attribute is selected, the first attribute that satisfies the condition is used to control the end of
session.
Note:: The real-time attributes can be found in the Reader Properties for PowerExchange for JMS, TIBCO,
webMethods, and SAP iDoc. For PowerExchange for WebSphere MQ , the real-time attributes must be specified

as a filter condition.
The next step is to set the Real-time Flush Latency attribute. The Flush Latency defines how often PowerCenter
should flush messages, expressed in milli-seconds.
For example, if the Real-time Flush Latency is set to 2000, PowerCenter flushes messages every two seconds.
The messages will also be flushed from the reader buffer if the Source Based Commit condition is reached. The
Source Based Commit condition is defined in the Properties tab of the session.
The message recovery option can be enabled to ensure that no messages are lost if a session fails as a result of
unpredictable error, such as power loss. This is especially important for real-time sessions because some
messaging applications do not store the messages after the messages are consumed by another application.
A unit of work (UOW) is a collection of changes within a single commit scope made by a transaction on the source
system from an external application. Each UOW may consist of a different number of rows depending on the
transaction to the source system. When you use the UOW Count Session condition, the Integration Service
commits source data to the target when it reaches the number of UOWs specified in the session condition.
For example, if the value for UOW Count is 10, the Integration Service commits all data read from the source after
the 10th UOW enters the source. The lower you set the value, the faster the Integration Service commits data to
the target. The lower value also causes the system to consume more resources.

Executing a Real-Time Session


A real-time session often has to be up and running continuously to listen to the messaging application and to
process messages immediately after the messages arrive. Set the reader attribute Idle Time to -1 and Flush
Latency to a specific time interval. This is applicable for all PowerExchange products except for PowerExchange
for WebSphere MQ where the session continues to run and flush the messages to the target using the specific
flush latency interval.
Another scenario is the ability to read data from another source system and immediately send it to a real-time
target. For example, reading data from a relational source and writing it to WebSphere MQ. In this case, set the
session to run continuously so that every change in the source system can be immediately reflected in the target.
A real-time session may run continuously until a condition is met to end the session. In some situations it may be
required to periodically stop the session and restart it. This is sometimes necessary to execute a post-session
command or run some other process that is not part of the session. To stop the session and restart it, it is useful to
deploy continuously running workflows. The Integration Service starts the next run of a continuous workflow as
soon as it completes the first.
To set a workflow to run continuously, edit the workflow and select the Scheduler tab. Edit the Scheduler and
select Run Continuously from Run Options. A continuous workflow starts automatically when the Integration
Service initializes. When the workflow stops, it restarts immediately.

Real-Time Sessions and Active Transformations


Some of the transformations in PowerCenter are active transformations, which means that the number of input
rows and output rows of the transformations are not the same. For most cases, active transformation requires all of
the input rows to be processed before processing the output row to the next transformation or target. For a realtime session, the flush latency will be ignored if DTM needs to wait for all the rows to be processed.
Depending on user needs, active transformations, such as aggregator, rank, sorter can be used in a real-time
session by setting the transaction scope property in the active transformation to Transaction. This signals the
session to process the data in the transformation every transaction. For example, if a real-time session is using an
aggregator that sums a field of an input, the summation will be done per transaction, as opposed to all rows. The

result may or may not be correct depending on the requirement. Use the active transformation with real-time
session if you want to process the data per transaction.
Custom transformations can also be defined to handle data per transaction so that they can be used in a real-time
session.

PowerExchange Real Time Connections


PowerExchange NRDB CDC Real Time connections can be used to extract changes from ADABAS, DATACOM,
IDMS, IMS and VSAM sources in real time.
The DB2/390 connection can be used to extract changes for DB2 on OS/390 and the DB2/400 connection to
extract from AS/400. There is a separate connection to read from DB2 UDB in real time.
The NRDB CDC connection requires the application name and the restart token file name to be overridden for
every session. When the PowerCenter session completes, the PowerCenter Server writes the last restart token to
a physical file called the RestartToken File. The next time the session starts, the PowerCenter Server reads the
restart token from the file and the starts reading changes from the point where it last left off. Every PowerCenter
session needs to have a unique restart token filename.
Informatica recommends archiving the file periodically. The reader timeout or the idle timeout can be used to stop a
real-time session. A post-session command can be used to archive the RestartToken file.
The encryption mode for this connection can slow down the read performance and increase resource consumption.
Compression mode can help in situations where the network is a bottleneck; using compression also increases the
CPU and memory usage on the source system.

Archiving PowerExchange Tokens


When the PowerCenter session completes, the Integration Service writes the last restart token to a physical file
called the RestartToken File. The token in the file indicates the end point where the read job ended. The next time
the session starts, the PowerCenter Server reads the restart token from the file and the starts reading changes
from the point where it left off. The token file is overwritten each time the session has to write a token out.
PowerCenter does not implicitly maintain an archive of these tokens.
If, for some reason, the changes from a particular point in time have to replayed, we need the PowerExchange
token from that point in time.
To enable such a process, it is a good practice to periodically copy the token file to a backup folder. This procedure
is necessary to maintain an archive of the PowerExchange tokens. A real-time PowerExchange session may be
stopped periodically, using either the reader time limit or the idle time limit. A post-session command is used to
copy the restart token file to an archive folder. The session will be part of a continuous running workflow, so when
the session completes after the post session command, it automatically restarts again. From a data processing
standpoint very little changes; the process pauses for a moment, archives the token, and starts again.
The following are examples of post-session commands that can be used to copy a restart token file (session.token)
and append the current system date/time to the file name for archive purposes:
cp session.token session`date '+%m%d%H%M'`.token
Windows:
copy session.token session-%date:~4,2%-%date:~7,2%-%date:~10,4%-%time:~0,2%-%time:~3,2%.token

PowerExchange for WebSphere MQ


1. In the Workflow Manager, connect to a repository and choose Connection > Queue
2. The Queue Connection Browser appears. Select New > Message Queue
3. The Connection Object Definition dialog box appears
You need to specify three attributes in the Connection Object Definition dialog box:
Name - the name for the connection. (Use <queue_name>_<QM_name> to uniquely identify the
connection.)
Queue Manager - the Queue Manager name for the message queue. (in Windows, the default Queue
Manager name is QM_<machine name>)
Queue Name - the Message Queue name
To obtain the Queue Manager and Message Queue names:
Open the MQ Series Administration Console. The Queue Manager should appear on the left panel
Expand the Queue Manager icon. A list of the queues for the queue manager appears on the left panel
Note that the Queue Managers name and Queue Name are case-sensitive.

PowerExchange for JMS


PowerExchange for JMS can be used to read or write messages from various JMS providers, such as WebSphere
MQ, JMS, BEA WebLogic Server.
There are two types of JMS application connections:
JNDI Application Connection, which is used to connect to a JNDI server during a session run.
JMS Application Connection, which is used to connect to a JMS provider during a session run.
JNDI Application Connection Attributes are:
Name
JNDI Context Factory
JNDI Provider URL
JNDI UserName
JNDI Password
JMS Application Connection
JMS Application Connection Attributes are:
Name
JMS Destination Type
JMS Connection Factory Name
JMS Destination
JMS UserName
JMS Password

Configuring the JNDI Connection for WebSphere MQ


The JNDI settings for WebSphere MQ JMS can be configured using a file system service or LDAP (Lightweight
Directory Access Protocol).
The JNDI setting is stored in a file named JMSAdmin.config. The file should be installed in the WebSphere MQ

Java installation/bin directory.


If you are using a file system service provider to store your JNDI settings, remove the number sign (#) before the
following context factory setting:
INITIAL_CONTEXT_FACTORY=com.sun.jndi.fscontext.RefFSContextFactory
Or, if you are using the LDAP service provider to store your JNDI settings, remove the number sign (#) before the
following context factory setting:
INITIAL_CONTEXT_FACTORY=com.sun.jndi.ldap.LdapCtxFactory
Find the PROVIDER_URL settings.
If you are using a file system service provider to store your JNDI settings, remove the number sign (#) before the
following provider URL setting and provide a value for the JNDI directory.
PROVIDER_URL=file: /<JNDI directory>
<JNDI directory> is the directory where you want JNDI to store the .binding file.
Or, if you are using the LDAP service provider to store your JNDI settings, remove the number sign (#) before the
provider URL setting and specify a hostname.
#PROVIDER_URL=ldap://<hostname>/context_name
For example, you can specify:
PROVIDER_URL=ldap://<localhost>/o=infa,c=rc
If you want to provide a user DN and password for connecting to JNDI, you can remove the # from the following
settings and enter a user DN and password:
PROVIDER_USERDN=cn=myname,o=infa,c=rc
PROVIDER_PASSWORD=test
The following table shows the JMSAdmin.config settings and the corresponding attributes in the JNDI application
connection in the Workflow Manager:

JMSAdmin.config Settings:

JNDI Application Connection Attribute

INITIAL_CONTEXT_FACTORY

JNDI Context Factory

PROVIDER_URL

JNDI Provider URL

PROVIDER_USERDN

JNDI UserName

PROVIDER_PASSWORD

JNDI Password

Configuring the JMS Connection for WebSphere MQ

The JMS connection is defined using a tool in JMS called jmsadmin, which is available in the WebSphere MQ Java
installation/bin directory. Use this tool to configure the JMS Connection Factory.
The JMS Connection Factory can be a Queue Connection Factory or Topic Connection Factory.
When Queue Connection Factory is used, define a JMS queue as the destination.
When Connection Factory is used, define a JMS topic as the destination.
The command to define a queue connection factory (qcf) is:
def qcf(<qcf_name>) qmgr(queue_manager_name)
hostname (QM_machine_hostname) port (QM_machine_port)
The command to define JMS queue is:
def q(<JMS_queue_name>) qmgr(queue_manager_name) qu(queue_manager_queue_name)
The command to define JMS topic connection factory (tcf) is:
def tcf(<tcf_name>) qmgr(queue_manager_name)
hostname (QM_machine_hostname) port (QM_machine_port)
The command to define the JMS topic is:
def t(<JMS_topic_name>) topic(pub/sub_topic_name)
The topic name must be unique. For example: topic (application/infa)
The following table shows the JMS object types and the corresponding attributes in the JMS application connection
in the Workflow Manager:

JMS Object Types

JMS Application Connection Attribute

QueueConnectionFactory or
TopicConnectionFactory

JMS Connection Name

JMS Queue Name or


JMS Topic Name

JMS Destination

Configure the JNDI and JMS Connection for WebSphere


Configure the JNDI settings for WebSphere to use WebSphere as a provider for JMS sources or targets in a
PowerCenterRT session.
JNDI Connection
Add the following option to the file JMSAdmin.bat to configure JMS properly:
-Djava.ext.dirs=<WebSphere Application Server>bin
For example: -Djava.ext.dirs=WebSphere\AppServer\bin
The JNDI connection resides in the JMSAdmin.config file, which is located in the MQ Series Java/bin directory.

INITIAL_CONTEXT_FACTORY=com.ibm.websphere.naming.wsInitialContextFactory
PROVIDER_URL=iiop://<hostname>/
For example:
PROVIDER_URL=iiop://localhost/
PROVIDER_USERDN=cn=informatica,o=infa,c=rc
PROVIDER_PASSWORD=test
JMS Connection
The JMS configuration is similar to the JMS Connection for WebSphere MQ.

Configure the JNDI and JMS Connection for BEA WebLogic


Configure the JNDI settings for BEA WebLogic to use BEA WebLogic as a provider for JMS sources or targets in
a PowerCenterRT session.
PowerCenter Connect for JMS and the JMS hosting Weblogic server do not need to be on the same server.
PowerCenter Connect for JMS just needs a URL, as long as the URL points to the right place.
JNDI Connection
The WebLogic Server automatically provides a context factory and URL during the JNDI set-up configuration for
WebLogic Server. Enter these values to configure the JNDI connection for JMS sources and targets in the
Workflow Manager.
Enter the following value for JNDI Context Factory in the JNDI Application Connection in the Workflow Manager:
weblogic.jndi.WLInitialContextFactory
Enter the following value for JNDI Provider URL in the JNDI Application Connection in the Workflow Manager:
t3://<WebLogic_Server_hostname>:<port>
where WebLogic Server hostname is the hostname or IP address of the WebLogic Server and port is the port
number for the WebLogic Server.
JMS Connection
The JMS connection is configured from the BEA WebLogic Server console. Select JMS -> Connection Factory.
The JMS Destination is also configured from the BEA WebLogic Server console.
From the Console pane, select Services > JMS > Servers > <JMS Server name> > Destinations under your
domain.
Click Configure a New JMSQueue or Configure a New JMSTopic.
The following table shows the JMS object types and the corresponding attributes in the JMS application connection
in the Workflow Manager:

WebLogic Server JMS Object

JMS Application Connection Attribute

Connection Factory Settings: JNDIName

JMS Application Connection Attribute

Connection Factory Settings: JNDIName

JMS Connection Factory Name

Destination Settings: JNDIName

JMS Destination

In addition to JNDI and JMS setting, BEA WebLogic also offers a function called JMS Store, which can be used for
persistent messaging when reading and writing JMS messages. The JMS Stores configuration is available from the
Console pane: select Services > JMS > Stores under your domain.

Configuring the JNDI and JMS Connection for TIBCO


TIBCO Rendezvous Server does not adhere to JMS specifications. As a result, PowerCenter Connect for JMS
cant connect directly with the Rendezvous Server. TIBCO Enterprise Server, which is JMS-compliant, acts as a
bridge between the PowerCenter Connect for JMS and TIBCO Rendezvous Server. Configure a connection-bridge
between TIBCO Rendezvous Server and TIBCO Enterpriser Server for PowerCenter Connect for JMS to be able
to read messages from and write messages to TIBCO Rendezvous Server.
To create a connection-bridge between PowerCenter Connect for JMS and TIBCO Rendezvous Server, follow
these steps:
1. Configure PowerCenter Connect for JMS to communicate with TIBCO Enterprise Server.
2. Configure TIBCO Enterprise Server to communicate with TIBCO Rendezvous Server.
Configure the following information in your JNDI application connection:
JNDI Context Factory.com.tibco.tibjms.naming.TibjmsInitialContextFactory
Provider URL.tibjmsnaming://<host>:<port> where host and port are the host name and port number of the
Enterprise Server.
To make a connection-bridge between TIBCO Rendezvous Server and TIBCO Enterpriser Server:
1. In the file tibjmsd.conf, enable the tibrv transport configuration parameter as in the example below, so that
TIBCO Enterprise Server can communicate with TIBCO Rendezvous messaging systems:

tibrv_transports = enabled
1. Enter the following transports in the transports.conf file:

[RV]
type = tibrv // type of external messaging system
topic_import_dm = TIBJMS_RELIABLE // only reliable/certified messages can transfer
daemon = tcp:localhost:7500 // default daemon for the Rendezvous server
The transports in the transports.conf configuration file specify the communication protocol between TIBCO
Enterprise for JMS and the TIBCO Rendezvous system. The import and export properties on a destination
can list one or more transports to use to communicate with the TIBCO Rendezvous system.

1. Optionally, specify the name of one or more transports for reliable and certified message delivery in the
export property in the file topics.conf. as in the following example:

The export property allows messages published to a topic by a JMS client to be exported to the external systems
with configured transports. Currently, you can configure transports for TIBCO Rendezvous reliable and certified
messaging protocols.

PowerExchange for webMethods


When importing webMethods sources into the Designer, be sure the webMethods host file doesnt contain .
character. You cant use fully-qualified names for the connection when importing webMethods sources. You can
use fully-qualified names for the connection when importing webMethods targets because PowerCenter doesnt
use the same grouping method for importing sources and targets. To get around this, modify the host file to resolve
the name to the IP address.
For example:
Host File:
crpc23232.crp.informatica.com

crpc23232

Use crpc23232 instead of crpc23232.crp.informatica.com as the host name when importing webMethods source
definition. This step is only required for importing PowerExchange for webMethods sources into the Designer.
If you are using the request/reply model in webMethods, PowerCenter needs to send an appropriate document
back to the broker for every document it receives. PowerCenter populates some of the envelope fields of the
webMethods target to enable webMethods broker to recognize that the published document is a reply from
PowerCenter. The envelope fields destid and tag are populated for the request/reply model. Destid should be
populated from the pubid of the source document and tag should be populated from tag of the source
document. Use the option Create Default Envelope Fields when importing webMethods sources and targets into
the Designer in order to make the envelope fields available in PowerCenter.

Configuring the PowerExchange for webMethods Connection


To create or edit the PowerExchange for webMethods connection select Connections > Application > webMethods
Broker from the Workflow Manager.
PowerExchange for webMethods connection attributes are:
Name
Broker Host
Broker Name
Client ID
Client Group
Application Name
Automatic Reconnect
Preserve Client State
Enter the connection to the Broker Host in the following format <hostname: port>.
If you are using the request/reply method in webMethods, you have to specify a client ID in the connection. Be sure
that the client ID used in the request connection is the same as the client ID used in the reply connection. Note that

if you are using multiple request/reply document pairs, you need to setup different webMethods connections for
each pair because they cannot share a client ID.

Master Data Management Architecture with Informatica

Master Data Management Architecture with


Informatica
Challenge
Data integration is critical to managing the modern business environment as companies find themselves with
multiple redundant systems that contain master data built on differing data models and data definitions. This
provides a challenge in data governance in terms of orchestrating people, policies, procedures and technology to
manage enterprise data availability, usability, integrity and security for business process efficiency and compliance.
Master data management addresses three major challenges in the modern business environment:
A need for cross-enterprise perspective for better business intelligence,
A similar need for consistency across customer records for improved transaction management.
An ability to provide data governance at the enterprise level.
A requirement to coexist with existing information technology infrastructure.

Description
A logical view of the MDM Hub, the data flow through the Hub, and the physical architecture of the Hub are
described in the following sections.

Logical View
A logical view of the MDM Hub is shown below:

The Hub supports access of data in the form of batch, real-time and/or asynchronous messaging. Typically, this
access is supported through a combination of data integration tools, such as Informatica Power Center and
embedded Hub functionality. In order to master the data in the hub optimally, the source data needs to be
analyzed. This analysis typically takes place using a data quality tool, such as Informatica Data Quality.
The goal of the Hub is to master data for one or more domains within a Customers environment. In the MDM Hub,
there is a significant amount of metadata maintained in order to support data mastering functionality, such as
lineage, history, survivorship and the like. The MDM Hub data model is completely flexible and can start from a
Customers existing model, and industry standard model, or a model may be created from scratch.
Once the data model has been defined, data needs to be cleansed and standardized. The MDM Hub provides an
open architecture allowing a Customer to leverage any Cleanse engine which they may already leverage, and it
provides an optimized interface for Informatica Data Quality.
Data is then matched in the system using a combination of deterministic and fuzzy matching. Informatica Identity
Recognition is the underlying match technology in the Hub, and the interfaces to it have been optimized for Hub
use and the interfaces abstracted such that they are easily leveraged by business users.
After matching has been performed, the Hub can consolidate records by linking them together to produce a registry
of related records or by merging them to produce a Golden Record or a Best Version of the Truth (BVT). When a
BVT is produced, survivorship rules defined in the MDM trust framework are applied such that the appropriate
attributes from the contributing source records are promoted into the BVT.
The BVT provides a basis for indentifying and managing relationships across entities and sources. By building on
top of the BVT, the MDM Hub can expose relationships which are cross source or cross entity and are not visible
within an individual source.
A data governance framework is exposed to data stewards through the Informatica Data Director (IDD). IDD
provides data governance task management functionality, rudimentary data governance workflows, and data
steward views of the data. If more complex workflows are required, external workflow engines can be easily
integrated into the Hub. Individual views of data from within the IDD can also be exposed directly into applications
through Informatica Data Controls.
There is an underlying security framework within the MDM Hub that provides fine grained controls of the access of

data within the Hub. The framework supports configuration of the security policies locally, or by consuming them
from external sources, based on a customers desired infrastructure.

Data Flow
A typical data flow through the Hub is shown below:

Implementations of the MDM hub start by defining the data model into which all of the data will be consolidated.
This target data model will contain the BVT and the associated metadata to support it. Source data is brought into
the hub by putting it into a set of Landing Tables. A Landing Table is a representation of the data source in the
general form of the source. There is an equivalent table known as a Staging Table, which represents the source
data, but in the format of the Target Data model. Therefore, data needs to be transformed from the Landing Table
to the Staging table, and this happens within the MDM Hub as follows:
1. The incoming data is run through a Delta Detection process to determine if it has changed since the last
time it was processed. Only records that have changed are processed.
2. Records are run through a staging process which transforms the data to the form of the Target Model. The
staging process is a mapping within the MDM Hub which may perform any number of standardization,
cleansing or transformation processes. The mappings also allow for external cleanse engines to be invoked.
3. Records are then loaded into the landing table. The pre-cleansed version of the records are stored in a
RAW table, and records which are inappropriate to stage (for example, they have structural deficiencies
such as a duplicate PKEY) are written to a REJECT table to be manually corrected at a later time.
The data in the Staging Table is then loaded into the Base Objects. This process first applies a trust scores to
attributes for which it has been defined. Trust scores represent the relative survivorship of an attribute and are
calculated at the time the record is loaded, based on the currency of the data, the data source, and other
characteristics of the attribute.
Records are then pushed through a matching process which generates a set of candidates for merging. Depending
on which match rules caused a record to match, the record will be queued either for automatic merging or for
manual merging. Records that do not match will be loaded into the Base Object as unique records. Records
queued for automatic merge will be processed by the Hub without human intervention; those queued for manual
merge will be displayed to a Data Steward for further processing.

All data in the hub is available for consumption as a batch, as a set of outbound asynchronous messages or
through a real-time services interface.

Physical Architecture
The MDM Hub is designed as three-tier architecture. These tiers consist of the MDM Hub Store, the MDM Hub
Server(s) (includes Cleanse-Match Servers) and the MDM User Interface.
The Hub Store is where business data is stored and consolidated. The Hub Store contains common information
about all of the databases that are part of an MDM Hub implementation. It resides in a supported database server
environment. The Hub Server is the run-time component that manages core and common services for the MDM
Hub. The Hub Server is a J2EE application, deployed on a supported application server that orchestrates the data
processing within the Hub Store, as well as integration with external applications. Refer to the latest Product
Availability Matrix for which versions of databases, application servers, and operating systems are currently
supported for the MDM Hub.
The Hub may be implemented in either a standard architecture or in a high availability architecture. In order to
achieve high availability, Informatica recommends the configuration shown below:

This configuration employs a properly sized DB server and application server(s). The DB server is configured as
multiple DB cluster nodes. The database is distributed in SAN architecture. The application server requires

sufficient file space to support efficient match batch group sizes. Refer to the MDM Sizing Guidelines to properly
size each of these tiers.
Data base redundancy is provided through the use of the database cluster, and application server redundancy is
provided through application server clustering.
To support geographic distribution, the HA architecture described above is replicated in a second node, with
failover provided using a log replication approach. This configuration is intended to support Hot/Warm or Hot/Cold
environments, but does not support Hot/Hot operation.

Leveraging PowerCenter Concurrent Workflows

Leveraging PowerCenter Concurrent Workflows


Challenge
Before the introduction of PowerCenters Concurrent Workflow feature, customers would make copies of workflows
and run those using different names. This not only caused additional work, but also created maintenance issues
during changes to the workflow logic. With PowerCenters Concurrent Workflow feature, it is now possible to run
more than one instance of a workflow.

Description
Use Case Scenarios
Message Queue Processing
When data is read from a message queue, the data values in the queue can be used to determine which source
data to process and which targets to load the processed data. In this scenario, different instances of the same
workflow should run concurrently and pass different connection parameters to the instances of the workflow
depending on the parameters read from the message queue. One example is a hosted data warehouse for 120
financial institutions where it is necessary to execute workflows for all the institutions in a small time frame.

Web Services
Different consumers of a web service need the capability to launch workflows to extract data from different external
systems and integrate it with internal application data. Each instance of the workflow can accept different
parameters to determine where to extract the data from and where to load the data to. For example, the Web
Services Hub needs to execute multiple instances of the same web service workflow when web services requests
increase.

Configuring Concurrent Workflows


One option is to run with the same instance name. When a workflow is configured to run with the same instance
name, the Integration Service uses the same variables and parameters for each run. The Workflow Monitor
displays the Run Id to distinguish between each workflow.
Informatica recommends using unique instance names instead of the same name with different Run Id values to
implement Concurrent Workflows. With unique instance names, it is possible to allow concurrent runs only with
unique instance names. This option enables execution of Concurrent Workflows with different instance names. For
example, a different name can be used for the configuration of each workflow instance and a separate parameter
file can be used for each instance. The Integration Service can persist variables for each workflow instance. When
the workflow is executed, the Integration Service runs only the configured instances of the workflow.

Tips & Techniques


There are several tips and techniques that should be considered for the successful implementation of Concurrent

Workflows. If the target is a database system, database partitioning can be used to prevent contention issues when
inserting or updating the same table. When database partitioning is used, concurrent writes to the same table will
less likely encounter deadlock issues.
Competing resources such as lookups are another source of concern that should be addressed when running
Concurrent Workflows. Lookup caches as well as log files should be exclusive for concurrent workflows to avoid
contention.
Partitioning should also be considered. Mapping Partitioning or data partitioning is not impacted by the Concurrent
Workflow feature and can be used with minimal impact.
On the other hand, parameter files should be created dynamically for the dynamic concurrent workflow option. This
requires the development of a methodology to generate the parameter files at run time. A database driven option
can be used for maintaining the parameters in database tables. During the execution of the Concurrent Workflows,
the parameter files can be generated from the database.

You might also like